Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Jinja whitespace handling in rules #1647

Conversation

barrywhart
Copy link
Member

@barrywhart barrywhart commented Oct 14, 2021

Brief summary of the change made

Fixes #1594, #1608
Makes progress on #1437

Are there any other side effects of this change that we should be aware of?

It changes the Jinja lex / raw file slicing, which is a scary change, but it replaces a hack with something more robust. This PR replaces #1614, the scary dbt PR.

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • .yml rule test cases in test/fixtures/rules/std_rule_cases.
    • .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with python test/generate_parse_fixture_yml.py or by running tox locally).
    • Full autofix test cases in test/fixtures/linter/autofix.
    • Other.
  • Added appropriate documentation for the change.

  • Created GitHub issues for any relevant followup/future enhancements if appropriate.

@barrywhart barrywhart marked this pull request as draft October 14, 2021 11:00
@codecov
Copy link

codecov bot commented Oct 14, 2021

Codecov Report

Merging #1647 (7ad47e2) into main (3970297) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main     #1647   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          133       133           
  Lines         9317      9316    -1     
=========================================
- Hits          9317      9316    -1     
Impacted Files Coverage Δ
src/sqlfluff/core/linter/linted_file.py 100.00% <ø> (ø)
src/sqlfluff/core/templaters/base.py 100.00% <ø> (ø)
src/sqlfluff/core/templaters/jinja.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3970297...7ad47e2. Read the comment docs.

@barrywhart barrywhart changed the title Issue 1437: Robust Jinja raw/templated slice mapping Issue 1437: Replace hacky Jinja whitespace control with something cleaner that still works with raw/templated slice mapping Oct 14, 2021
@barrywhart barrywhart marked this pull request as ready for review October 14, 2021 14:47
@tunetheweb
Copy link
Member

BTW is this instead of #1614? Or on top of that?

If it's instead, then does it close #1594 and #1608 like it was going to?

@barrywhart
Copy link
Member Author

This PR potentially replaces the other one. I want to finish up the core work (e.g. handling comments), then I'll see if it addresses the other issues (or could easily be extended to do so).

The two PRs are definitely contradictory, in that the other one updates some code that this one removes. I'll mark them both as "Draft" until the outcome is clearer.

(
select
col_name,
{{- echo('col_name') -}} as col_name2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, sqlfluff fix was corrupting this templated line of code.

@barrywhart
Copy link
Member Author

@alanmcruickshank: This is ready for another review.

return self.sliced_file[-1].source_slice
return self.sliced_file[-1].source_slice # pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this gone out of cover? Is this still the right return value? Or should the first "We should never get here" clause include this (i.e. a >= instead of a >)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why -- I think this is one of those areas where there are no explicit tests, it's just tested implicitly by higher-level code. Presumably, updating the raw / templated mapping caused it to stop hitting this code. I can't say if it's no longer necessary or if there's just no test case that's hitting it anymore.

if elem_type == "data":
yield RawFileSlice(raw, "literal", idx)
idx += len(raw)
continue
str_buff += raw

if elem_type.endswith("_begin"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's the case. If you put this SQL in a test.sql file:

SELECT
    {{ 'col1,' -}}
    col2
FROM
    table1

And run sqlfluff parse test.sql and then remove the negative sign and rerun you'll see it's different (it's missing the newline and whitespace before col2 as expected when the -}} is used).

So it definitely does something. So confused why it's not needed. Or is the fact you've not coded that the real reason for the strange "after" in above test case?

test/core/templaters/jinja_test.py Show resolved Hide resolved
Comment on lines 146 to 157
# Note this is basically identical to the "basic_data" case above.
# "Right strip" is not actually a thing in Jinja.
RawTemplatedTestCase(
name="strip_right_data",
instr="""select
c1,
{{ 'c' -}}2 as user_id
""",
templated_str="""select
c1,
c2 as user_id
""",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a valid test case? There's no whitespace after the -}} so nothing for it to strip in this example. So no wonder it isn't doing anything!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced this test case with a new one that does have whitespace to strip (the newline).

SELECT
  {{ 'col1,' -}}
  col2

name="strip_both_data",
instr="""select
c1,
{{- 'c' -}}2 as user_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again nothing for the -}} to do here as no space after to strip.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this test case to add a newline afterwards:

select
    c1,
    {{- 'c' -}}
2 as user_id

test/core/templaters/jinja_test.py Show resolved Hide resolved
(
select
col_name,
{{- echo('col_name') -}} as col_name2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the extra whitespace is EXACTLY the length of col_name. Is it possible you're not taking into account the length of the templated text?

Comment on lines +363 to +377
num_chars_skipped = in_str.index(raw, idx) - idx
if num_chars_skipped:
# Yes. It skipped over some characters. Compute a string
# containing the skipped characters.
skipped_str = in_str[idx : idx + num_chars_skipped]

# Sanity check: Verify that Jinja only skips over
# WHITESPACE, never anything else.
if not skipped_str.isspace(): # pragma: no cover
templater_logger.warning(
"Jinja lex() skipped non-whitespace: %s", skipped_str
)
# Treat the skipped whitespace as a literal.
yield RawFileSlice(skipped_str, "literal", idx)
idx += num_chars_skipped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing!!! Seems so simple.

@barrywhart
Copy link
Member Author

barrywhart commented Oct 15, 2021

@tunetheweb, @alanmcruickshank: I think this is ready for another look. Note that I reworked it a bit since the last review. Now stripped whitespace is added back to the slice it was stripped from, rather than being returned as a separate slice. I think this is more logical, because in the source space, we don't care much about what Jinja is going to do when it renders the template. This also reduces cases where SQLFluff won't apply a fix that spans 3 or more source slices, e.g.

{% set x = 56 -%}
SELECT 3, 4;

With this update, the query above can be fixed (moving 3 and 4 columns onto separate lines). Previously, if the right-stripped newline on line 1 was returned as a separate slice, then the fix spanned 3 slices (the set, the newline, and the SELECT text), and the fix would not be applied. I hope this explanation makes sense and y'all agree with the reasoning. We can easily handle it either way (whitespace as a separate slice or not).

A bit more context about this PR:

  • Main goal is to ensure the slice boundary indices align with the raw file. This was not working correctly with left whitespace stripping and is fixed now.
  • Secondary goal is to ensure right whitespace stripping result in similar slicing (stripped whitespace returned as a separate literal slice). The goal here is consistency; AFAIK, the old behavior was not causing any user-facing issues.
  • Other than fixing the file corruption issue, this is more of an internal refactoring that slightly improves SQLFluff's understanding of Jinja's template processing. I could easily see making future changes to this code as we work on future, user-facing linting issues; this PR is just "setting the stage".
  • This PR makes some progress on Jinja: Can we do a more robust mapping between expanded code and original templated code? #1437, but does not help with some of the trickier issues where regions of similar code, e.g. unparsable section with reference to ephemeral dbt model (with same-name CTE?) #1571 or duplicate code caused by a for loop confuses the raw <---> templated mapping and can corrupt files or apply fixes to the wrong place. I hope to make an attempt at that soon. 🤞

Barry Hart added 2 commits October 15, 2021 12:23
…ng when I added handling for right whitespace stripping
…ng' of https://github.com/barrywhart/sqlfluff into bhart-issue_1437_robust_jinja_raw_templated_slice_mapping
@barrywhart barrywhart marked this pull request as draft October 15, 2021 16:58
@barrywhart barrywhart force-pushed the bhart-issue_1437_robust_jinja_raw_templated_slice_mapping branch from 62c0b1c to d1cd2bc Compare October 15, 2021 17:08
@barrywhart barrywhart marked this pull request as ready for review October 15, 2021 17:11
@barrywhart
Copy link
Member Author

I had briefly pushed a change to rework the new PR -- don't know if you noticed it. It broke some tests, I think not because of a flaw in the code itself, but it was somehow causing the deeper raw <----> source mapping bug seen in #1571 to impact some of our own tests. So I rolled back to the prior, working version. It was possibly an interesting idea for later (include stripped whitespace in the same slice it was stripped from, not a separate slice).

@@ -382,31 +382,31 @@ def fix_string(self) -> Tuple[Any, bool]:
" - Skipping edit patch on non-unique templated content: %s",
enriched_patch,
)
continue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After my changes, this section of code was no longer covered by tests. I am not familiar with this code and had not touched it in this PR, so for now I added # pragma: no cover.

I ran the full test suite in main to determine which test(s) were hitting this code, and there was only one: test/rules/std_fix_auto_test.py::test__std_fix_auto[ansi-017_lintresult_fixes_cannot_span_block_boundaries]. I checked my PR to see if the jinja.py changes were involved in this test, and they aren't (i.e. it executes the ifs in the new code, but skips over the bodies). This makes sense, because the test doesn't use whitespace stripping. I'd rather not go any farther down this rabbit hole, as it could be bottomless...

@barrywhart barrywhart marked this pull request as draft October 15, 2021 21:58
@barrywhart barrywhart marked this pull request as ready for review October 15, 2021 23:55
@tunetheweb tunetheweb changed the title Issue 1437: Replace hacky Jinja whitespace control with something cleaner that still works with raw/templated slice mapping Improve Jinja whitespace handling in rules Oct 16, 2021
@barrywhart barrywhart dismissed alanmcruickshank’s stale review October 16, 2021 16:45

Addressed the requested changes.

@barrywhart barrywhart merged commit 9dca764 into sqlfluff:main Oct 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

L001 violation: sqlfluff fix removes and add characters
3 participants