Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent rules incorrectly returning conflicting fixes to same position #2830

Merged

Conversation

barrywhart
Copy link
Member

@barrywhart barrywhart commented Mar 9, 2022

Brief summary of the change made

Fixes #2827

Changes:

  • Fixes L052 (as noted in the issue)
  • Fixes L036, L050,L053 (which had other issues uncovered by CI or internal linter checks)
  • Changes core linter behavior for rules returning multiple fixes with same anchor segment.
    • Old behavior: Arbitrarily applies one of the fixes, silently discards other fixes with the same anchor. Note this means fixes are not atomic -- bad!!
    • New behavior: Logs a warning and ignores the whole set of fixes. (During automated tests, raises an error rather than logging.)

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • .yml rule test cases in test/fixtures/rules/std_rule_cases.
    • .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
    • Full autofix test cases in test/fixtures/linter/autofix.
    • Other.
  • Added appropriate documentation for the change.

  • Created GitHub issues for any relevant followup/future enhancements if appropriate.

@barrywhart barrywhart marked this pull request as draft March 9, 2022 19:16
f"the same anchors. This is not supported, so the "
f"fixes will not be applied. %r",
fixes,
) # pragma: no cover
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New error detection code.

@@ -1053,7 +1053,6 @@ def apply_fixes(self, dialect, rule_code, fixes):
)
# We've applied a fix here. Move on, this also consumes the
# fix
# TODO: Maybe deal with overlapping fixes later.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this comment now that we're dealing with them (by warning and discarding)

[
SymbolSegment(raw=";", type="symbol", name="semicolon"),
],
)
Copy link
Member Author

@barrywhart barrywhart Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The originally reported bug was here.

NewlineSegment(),
SymbolSegment(raw=";", type="symbol", name="semicolon"),
],
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was practically identical to the buggy code. I refactored it to use the same helper function, thus it gets the same fix.

Comment on lines 247 to 253
if anchor_segment in whitespace_deletions:
# Can't delete() and create_after() the same segment. Use replace()
# instead.
lintfix_fn = LintFix.replace
whitespace_deletions = whitespace_deletions.select(
lambda seg: seg is not anchor_segment
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the heart of the fix. It prevents having two LintFixes with the same anchor.

@barrywhart barrywhart marked this pull request as ready for review March 9, 2022 19:38
@barrywhart barrywhart marked this pull request as draft March 9, 2022 19:43
@WittierDinosaur
Copy link
Contributor

New error seems to have blown up L036? Maybe more underlying bugs?

@barrywhart
Copy link
Member Author

Yes, L036 has a bug as well. Interesting that we hadn't noticed it before. I'll try and fix that in the same PR, as long as it's not a big nasty fix.

@barrywhart
Copy link
Member Author

Looking at one of the L036 failures, it is indeed returning multiple fixes with the same anchor. In this case, they are identical (both are deletes). That's why we haven't noticed previously. (If the fixes were different, we likely would've noticed, as with L052.) Should be a simple fix. 🤞

@barrywhart
Copy link
Member Author

Ok, I fixed the L036 bug. It appears L001 has a "duplicate anchors" bug as well!

@tunetheweb
Copy link
Member

Ok, I fixed the L036 bug. It appears L001 has a "duplicate anchors" bug as well!

06C57055-C735-4B09-AA2F-6CDF6E392A37

@barrywhart barrywhart changed the title Fix L052 bug deleting space after Snowflake SET statement Detect when a rule returns multiple fixes with same anchor, fix L036 and L052 Mar 9, 2022
@barrywhart
Copy link
Member Author

It turns out L001 was okay. In one test, it was returning two deletions with segments that had the same position info, but were different objects. I updated the PR to use a new class, IdentitySet, that checks for membership by object identity rather than equality.

@barrywhart barrywhart marked this pull request as ready for review March 9, 2022 21:01
@codecov
Copy link

codecov bot commented Mar 9, 2022

Codecov Report

Merging #2830 (808d92d) into main (e94005e) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main     #2830   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          163       163           
  Lines        12336     12419   +83     
=========================================
+ Hits         12336     12419   +83     
Impacted Files Coverage Δ
src/sqlfluff/core/linter/linter.py 100.00% <100.00%> (ø)
src/sqlfluff/core/parser/segments/base.py 100.00% <100.00%> (ø)
src/sqlfluff/rules/L036.py 100.00% <100.00%> (ø)
src/sqlfluff/rules/L039.py 100.00% <100.00%> (ø)
src/sqlfluff/rules/L050.py 100.00% <100.00%> (ø)
src/sqlfluff/rules/L052.py 100.00% <100.00%> (ø)
src/sqlfluff/rules/L053.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e94005e...808d92d. Read the comment docs.

@barrywhart
Copy link
Member Author

@tunetheweb, @WittierDinosaur: Ok, ready for review!!

@barrywhart barrywhart changed the title Detect when a rule returns multiple fixes with same anchor, fix L036 and L052 Warn on rules returning multiple fixes with same anchor; fix L036 and L052 Mar 9, 2022
@barrywhart barrywhart changed the title Warn on rules returning multiple fixes with same anchor; fix L036 and L052 Log critical message on rules returning multiple fixes with same anchor (unless 1 each of create_before+create_after); fix L036, L052, L053 Mar 10, 2022
f = fix_buff.pop()
# Look for identity not just equality.
# This handles potential positioning ambiguity.
if f.anchor is seg:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, I reworked the core apply_fixes() logic:

  • Consumes a dictionary of AnchorEditInfo rather than a list of fixes
  • Add the ability to handle create_before and create_after the same anchor (new requirement for L053)

Bonus: The new logic is more straightforward (dictionary lookup/removal versus making copies of lists and moving things between them). It's probably more efficient as well -- the old logic was scanning all the unused fixes each time, to try and match it against the current segment.

else:
seg_buffer.append(seg)
# Switch over the the unused list
fixes = unused_fixes + fix_buff
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note how this tricky "end of loop" logic all goes away now that we're using a dictionary instead.

@barrywhart barrywhart marked this pull request as draft March 10, 2022 15:07
@barrywhart barrywhart changed the title Log critical message on rules returning multiple fixes with same anchor (unless 1 each of create_before+create_after); fix L036, L052, L053 Log critical message on rules returning multiple fixes with same anchor (unless 1 each of create_before+create_after); fix L036, L050, L052, L053 Mar 10, 2022
@barrywhart barrywhart marked this pull request as ready for review March 10, 2022 15:16
fixes_ += [
LintFix.delete(seg)
for seg in move_after_select_clause
if seg not in all_deletes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tunetheweb: I'd like your thoughts on a question. This PR adds a new feature that prohibits fixes with the same anchor with one exception: It's okay to have 2 fixes, one create_before and one create_after.

Should we also allow multiple deletes of the same segment? There's no ambiguity there, and it would avoid needing to "fix" this rule as well as L053.

It's kind of a philosophical question. If we decide to allow multiple deletes, we make it easier on rule writers. On the other hand, we're letting them be a bit sloppy. 🤷‍♂️ I could be convinced either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... I think safer to not allow that. In theory it shouldn't be needed so the developer ease is not a strong enough argument in my mind. And curious why L036 currently does it? Looked at the code but wasn't immediately apparent to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not a "good" reason, and I am not that familiar with L036 details, but basically, there are a couple places that delete unnecessary whitespace, and there was no bookkeeping, so in some cases, the same whitespace gets deleted twice.

Happy to leave the fix checker "as is" for now (i.e. not allow multiple deletes).

# whitespace multiple times (i.e. for non-raw segments higher in the
# tree).
if not context.segment.is_raw():
return None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allowed multiple deletions, we wouldn't need this change.

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing some coverage too.

Comment on lines 524 to 535
if any(
not info.is_valid for info in anchor_info.values()
): # pragma: no cover
message = (
f"Rule {crawler.code} returned multiple fixes with the "
f"same anchor. This is only supported for create_before+"
f"create_after, so the fixes will not be applied. {fixes!r}"
)
cls._report_duplicate_anchors_error(message)
elif fixes == last_fixes: # pragma: no cover
cls._warn_unfixable(crawler.code)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment explaining what each of these three parts are for?

First one I think it covered by the error message.
Second is what? When there are errors but no fixes?
Third is what the good case? We have fixes and they look valid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Comment on lines 79 to 90
Cases:
* 1 fix of any type: Valid
* 2 fixes: Valid if and only if types are create_before and create_after
"""
if self.total <= 1:
# Definitely no duplicates if <= 1.
return True
if self.total != 2: # pragma: no cover
# Definitely duplicates if > 2.
return False
# Special case: Ok to create before and after same segment.
return self.create_before == 1 and self.create_after == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads confusingly. I think below is clearer.

Also should first case only allow == 1? When would it be 0? Or less than 0? Should that instead fall through to the False case?

Suggested change
Cases:
* 1 fix of any type: Valid
* 2 fixes: Valid if and only if types are create_before and create_after
"""
if self.total <= 1:
# Definitely no duplicates if <= 1.
return True
if self.total != 2: # pragma: no cover
# Definitely duplicates if > 2.
return False
# Special case: Ok to create before and after same segment.
return self.create_before == 1 and self.create_after == 1
Cases:
* 1 fix of any type: Valid
* 2 fixes: Valid if and only if types are create_before and create_after
"""
if self.total == 1:
# Definitely no duplicates if == 1.
return True
if self.total == 2:
# This is only OK for this special case:
return self.create_before == 1 and self.create_after == 1
# Definitely duplicates if < 1 or > 2.
return False # pragma: no cover

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it to something similar as you suggest.

As currently written, 0 will never occur because we only call is_valid if there are fixes. But I'd prefer to treat 0 as valid because it's harmless and gives us a bit of future proofing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Comment on lines 128 to 132
if violations:
# Check each violation. If any of its fixes uses the same anchor
# as a previously returned fix, discard it. The linter can't handle
# applying fixes like this. Skipping this issue is okay because it
# will be detected and fixed during the next linter pass.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code in L039? Feels like core code that should be in BaseRule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want BaseRule to discard when multiple fixes have the same anchor, because we consider it an error (the is_valid check).

L039 had duplicate anchors on this new test case (extracted from one of the .sql fixtures):

test_excess_space_cast:
  fail_str: |
    select
        '1'    ::   INT as id1,
        '2'::int as id2
    from table_a
  fix_str: |
    select
        '1'::INT as id1,
        '2'::int as id2
    from table_a

L039 wants to make two fixes to the line '1' :: INT as id1,.

  1. Replace excessively long whitespace with a single whitespace: " " -> " " (2 places)
  2. Entirely remove the whitespace around the ::.

Thus, it's trying to both replace and delete the same whitespace. If we return both fixes, the core linter will (appropriately) complain and discard both fixes. This bookkeeping ensures that both get fixed, but it's split across two passes through the linter loop. There may be a smarter way to do this, but this approach seems reasonable. I'm trying really hard not to do big rewrites of existing rule code during these PRs -- the goal is to eliminate the critical errors but try and avoid going down a 🐰 hole.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks makes sense. Maybe add a comment saying, something like:

This rule works in two steps to remove unnecessary white space:

  1. Replace duplicate whitespace to one single whitespace
  2. Remove single white spaces if needed.

This can result in two delete being applied to same segment so area so check for that and replace with single delete.

src/sqlfluff/rules/L052.py Show resolved Hide resolved
);
# Yes, the formatting looks bad, but that's because we're only running L053
# here. In the real world, other rules will tidy up the formatting.
fix_str: "\n SELECT\n foo,\n bar,\n baz\n FROM mycte2\n;\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using \n when that's not what's used for the fail_str? Shouldn't it be consistent? Would also make the initial space look like "bad".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because of a YAML limitation. The fix string has a blank line at the end, but the YAML parser doesn't pick it up; it assumes blank line means end of string. The fail_str doesn't have a blank line. Think I should change it? I prefer to use "normal" multi-line strings when possible, using quoted strings with \n or other escape sequences only when necessary or for readability. Happy to change the fail_str if you like, though!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go ahead and change fail_str.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

)
# Yes, the formatting looks bad, but that's because we're only running L053
# here. In the real world, other rules will tidy up the formatting.
fix_str: " -- This\n SELECT\n foo,\n bar,\n baz\n FROM mycte2\n\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@barrywhart
Copy link
Member Author

@tunetheweb: Ready for another review.

@barrywhart
Copy link
Member Author

@alanmcruickshank: You may be interested in the changes to BaseSegment.apply_fixes(). We now handle one case of "overlapping fixes": two fixes with the same anchor, one is create_before and one is create_after.

@OTooleMichael: You may be interested because it adds a new "sanity check" to the fixes. Previously, rules could return conflicting fixes -- it would apply one fix and silently ignore the rest.

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work @barrywhart ! Lots of clean up and should catch all these issues going forward.

Just going to suggest a rename of the PR for release notes to “Prevent rules incorrectly applying multiple fixes to same position.”

@barrywhart barrywhart changed the title Log critical message on rules returning multiple fixes with same anchor (unless 1 each of create_before+create_after); fix L036, L050, L052, L053 Prevent rules incorrectly returning conflicting fixes to same position Mar 10, 2022
@barrywhart barrywhart merged commit 0e91f42 into sqlfluff:main Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snowflake - Fix Error when semicolon is preceeded by space
3 participants