New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Jinja whitespace handling in rules #1647
Improve Jinja whitespace handling in rules #1647
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1647 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 133 133
Lines 9317 9316 -1
=========================================
- Hits 9317 9316 -1
Continue to review full report at Codecov.
|
test/fixtures/linter/autofix/ansi/016_no_fix_in_template_loops/after.sql
Show resolved
Hide resolved
This PR potentially replaces the other one. I want to finish up the core work (e.g. handling comments), then I'll see if it addresses the other issues (or could easily be extended to do so). The two PRs are definitely contradictory, in that the other one updates some code that this one removes. I'll mark them both as "Draft" until the outcome is clearer. |
…ng' of https://github.com/barrywhart/sqlfluff into bhart-issue_1437_robust_jinja_raw_templated_slice_mapping
( | ||
select | ||
col_name, | ||
{{- echo('col_name') -}} as col_name2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, sqlfluff fix
was corrupting this templated line of code.
@alanmcruickshank: This is ready for another review. |
return self.sliced_file[-1].source_slice | ||
return self.sliced_file[-1].source_slice # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this gone out of cover? Is this still the right return value? Or should the first "We should never get here" clause include this (i.e. a >=
instead of a >
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why -- I think this is one of those areas where there are no explicit tests, it's just tested implicitly by higher-level code. Presumably, updating the raw / templated mapping caused it to stop hitting this code. I can't say if it's no longer necessary or if there's just no test case that's hitting it anymore.
if elem_type == "data": | ||
yield RawFileSlice(raw, "literal", idx) | ||
idx += len(raw) | ||
continue | ||
str_buff += raw | ||
|
||
if elem_type.endswith("_begin"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's the case. If you put this SQL in a test.sql
file:
SELECT
{{ 'col1,' -}}
col2
FROM
table1
And run sqlfluff parse test.sql
and then remove the negative sign and rerun you'll see it's different (it's missing the newline and whitespace before col2
as expected when the -}}
is used).
So it definitely does something. So confused why it's not needed. Or is the fact you've not coded that the real reason for the strange "after" in above test case?
test/core/templaters/jinja_test.py
Outdated
# Note this is basically identical to the "basic_data" case above. | ||
# "Right strip" is not actually a thing in Jinja. | ||
RawTemplatedTestCase( | ||
name="strip_right_data", | ||
instr="""select | ||
c1, | ||
{{ 'c' -}}2 as user_id | ||
""", | ||
templated_str="""select | ||
c1, | ||
c2 as user_id | ||
""", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid test case? There's no whitespace after the -}}
so nothing for it to strip in this example. So no wonder it isn't doing anything!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced this test case with a new one that does have whitespace to strip (the newline).
SELECT
{{ 'col1,' -}}
col2
test/core/templaters/jinja_test.py
Outdated
name="strip_both_data", | ||
instr="""select | ||
c1, | ||
{{- 'c' -}}2 as user_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again nothing for the -}}
to do here as no space after to strip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this test case to add a newline afterwards:
select
c1,
{{- 'c' -}}
2 as user_id
( | ||
select | ||
col_name, | ||
{{- echo('col_name') -}} as col_name2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly the extra whitespace is EXACTLY the length of col_name
. Is it possible you're not taking into account the length of the templated text?
num_chars_skipped = in_str.index(raw, idx) - idx | ||
if num_chars_skipped: | ||
# Yes. It skipped over some characters. Compute a string | ||
# containing the skipped characters. | ||
skipped_str = in_str[idx : idx + num_chars_skipped] | ||
|
||
# Sanity check: Verify that Jinja only skips over | ||
# WHITESPACE, never anything else. | ||
if not skipped_str.isspace(): # pragma: no cover | ||
templater_logger.warning( | ||
"Jinja lex() skipped non-whitespace: %s", skipped_str | ||
) | ||
# Treat the skipped whitespace as a literal. | ||
yield RawFileSlice(skipped_str, "literal", idx) | ||
idx += num_chars_skipped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing!!! Seems so simple.
@tunetheweb, @alanmcruickshank: I think this is ready for another look. Note that I reworked it a bit since the last review. Now stripped whitespace is added back to the slice it was stripped from, rather than being returned as a separate slice. I think this is more logical, because in the source space, we don't care much about what Jinja is going to do when it renders the template. This also reduces cases where SQLFluff won't apply a fix that spans 3 or more source slices, e.g.
With this update, the query above can be fixed (moving A bit more context about this PR:
|
…ng when I added handling for right whitespace stripping
…ng' of https://github.com/barrywhart/sqlfluff into bhart-issue_1437_robust_jinja_raw_templated_slice_mapping
62c0b1c
to
d1cd2bc
Compare
I had briefly pushed a change to rework the new PR -- don't know if you noticed it. It broke some tests, I think not because of a flaw in the code itself, but it was somehow causing the deeper raw <----> source mapping bug seen in #1571 to impact some of our own tests. So I rolled back to the prior, working version. It was possibly an interesting idea for later (include stripped whitespace in the same slice it was stripped from, not a separate slice). |
…ng' of https://github.com/barrywhart/sqlfluff into bhart-issue_1437_robust_jinja_raw_templated_slice_mapping
@@ -382,31 +382,31 @@ def fix_string(self) -> Tuple[Any, bool]: | |||
" - Skipping edit patch on non-unique templated content: %s", | |||
enriched_patch, | |||
) | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After my changes, this section of code was no longer covered by tests. I am not familiar with this code and had not touched it in this PR, so for now I added # pragma: no cover
.
I ran the full test suite in main
to determine which test(s) were hitting this code, and there was only one: test/rules/std_fix_auto_test.py::test__std_fix_auto[ansi-017_lintresult_fixes_cannot_span_block_boundaries]
. I checked my PR to see if the jinja.py
changes were involved in this test, and they aren't (i.e. it executes the if
s in the new code, but skips over the bodies). This makes sense, because the test doesn't use whitespace stripping. I'd rather not go any farther down this rabbit hole, as it could be bottomless...
…ng' of https://github.com/barrywhart/sqlfluff into bhart-issue_1437_robust_jinja_raw_templated_slice_mapping
Addressed the requested changes.
Brief summary of the change made
Fixes #1594, #1608
Makes progress on #1437
Are there any other side effects of this change that we should be aware of?
It changes the Jinja lex / raw file slicing, which is a scary change, but it replaces a hack with something more robust. This PR replaces #1614, the scary dbt PR.
Pull Request checklist
Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
.yml
rule test cases intest/fixtures/rules/std_rule_cases
..sql
/.yml
parser test cases intest/fixtures/dialects
(note YML files can be auto generated withpython test/generate_parse_fixture_yml.py
or by runningtox
locally).test/fixtures/linter/autofix
.Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.