New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent rules incorrectly returning conflicting fixes to same position #2830
Prevent rules incorrectly returning conflicting fixes to same position #2830
Conversation
src/sqlfluff/core/linter/linter.py
Outdated
f"the same anchors. This is not supported, so the " | ||
f"fixes will not be applied. %r", | ||
fixes, | ||
) # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New error detection code.
@@ -1053,7 +1053,6 @@ def apply_fixes(self, dialect, rule_code, fixes): | |||
) | |||
# We've applied a fix here. Move on, this also consumes the | |||
# fix | |||
# TODO: Maybe deal with overlapping fixes later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this comment now that we're dealing with them (by warning and discarding)
[ | ||
SymbolSegment(raw=";", type="symbol", name="semicolon"), | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The originally reported bug was here.
NewlineSegment(), | ||
SymbolSegment(raw=";", type="symbol", name="semicolon"), | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was practically identical to the buggy code. I refactored it to use the same helper function, thus it gets the same fix.
src/sqlfluff/rules/L052.py
Outdated
if anchor_segment in whitespace_deletions: | ||
# Can't delete() and create_after() the same segment. Use replace() | ||
# instead. | ||
lintfix_fn = LintFix.replace | ||
whitespace_deletions = whitespace_deletions.select( | ||
lambda seg: seg is not anchor_segment | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the heart of the fix. It prevents having two LintFix
es with the same anchor.
New error seems to have blown up L036? Maybe more underlying bugs? |
Yes, L036 has a bug as well. Interesting that we hadn't noticed it before. I'll try and fix that in the same PR, as long as it's not a big nasty fix. |
Looking at one of the L036 failures, it is indeed returning multiple fixes with the same anchor. In this case, they are identical (both are |
Ok, I fixed the L036 bug. It appears L001 has a "duplicate anchors" bug as well! |
It turns out L001 was okay. In one test, it was returning two deletions with segments that had the same position info, but were different objects. I updated the PR to use a new class, |
Codecov Report
@@ Coverage Diff @@
## main #2830 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 163 163
Lines 12336 12419 +83
=========================================
+ Hits 12336 12419 +83
Continue to review full report at Codecov.
|
@tunetheweb, @WittierDinosaur: Ok, ready for review!! |
f = fix_buff.pop() | ||
# Look for identity not just equality. | ||
# This handles potential positioning ambiguity. | ||
if f.anchor is seg: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below, I reworked the core apply_fixes()
logic:
- Consumes a dictionary of
AnchorEditInfo
rather than a list of fixes - Add the ability to handle
create_before
andcreate_after
the same anchor (new requirement for L053)
Bonus: The new logic is more straightforward (dictionary lookup/removal versus making copies of lists and moving things between them). It's probably more efficient as well -- the old logic was scanning all the unused fixes each time, to try and match it against the current segment.
else: | ||
seg_buffer.append(seg) | ||
# Switch over the the unused list | ||
fixes = unused_fixes + fix_buff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note how this tricky "end of loop" logic all goes away now that we're using a dictionary instead.
fixes_ += [ | ||
LintFix.delete(seg) | ||
for seg in move_after_select_clause | ||
if seg not in all_deletes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tunetheweb: I'd like your thoughts on a question. This PR adds a new feature that prohibits fixes with the same anchor with one exception: It's okay to have 2 fixes, one create_before
and one create_after
.
Should we also allow multiple deletes of the same segment? There's no ambiguity there, and it would avoid needing to "fix" this rule as well as L053.
It's kind of a philosophical question. If we decide to allow multiple deletes, we make it easier on rule writers. On the other hand, we're letting them be a bit sloppy. 🤷♂️ I could be convinced either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.... I think safer to not allow that. In theory it shouldn't be needed so the developer ease is not a strong enough argument in my mind. And curious why L036 currently does it? Looked at the code but wasn't immediately apparent to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not a "good" reason, and I am not that familiar with L036 details, but basically, there are a couple places that delete unnecessary whitespace, and there was no bookkeeping, so in some cases, the same whitespace gets deleted twice.
Happy to leave the fix checker "as is" for now (i.e. not allow multiple deletes).
# whitespace multiple times (i.e. for non-raw segments higher in the | ||
# tree). | ||
if not context.segment.is_raw(): | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allowed multiple deletions, we wouldn't need this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing some coverage too.
src/sqlfluff/core/linter/linter.py
Outdated
if any( | ||
not info.is_valid for info in anchor_info.values() | ||
): # pragma: no cover | ||
message = ( | ||
f"Rule {crawler.code} returned multiple fixes with the " | ||
f"same anchor. This is only supported for create_before+" | ||
f"create_after, so the fixes will not be applied. {fixes!r}" | ||
) | ||
cls._report_duplicate_anchors_error(message) | ||
elif fixes == last_fixes: # pragma: no cover | ||
cls._warn_unfixable(crawler.code) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment explaining what each of these three parts are for?
First one I think it covered by the error message.
Second is what? When there are errors but no fixes?
Third is what the good case? We have fixes and they look valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Cases: | ||
* 1 fix of any type: Valid | ||
* 2 fixes: Valid if and only if types are create_before and create_after | ||
""" | ||
if self.total <= 1: | ||
# Definitely no duplicates if <= 1. | ||
return True | ||
if self.total != 2: # pragma: no cover | ||
# Definitely duplicates if > 2. | ||
return False | ||
# Special case: Ok to create before and after same segment. | ||
return self.create_before == 1 and self.create_after == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads confusingly. I think below is clearer.
Also should first case only allow == 1
? When would it be 0
? Or less than 0
? Should that instead fall through to the False case?
Cases: | |
* 1 fix of any type: Valid | |
* 2 fixes: Valid if and only if types are create_before and create_after | |
""" | |
if self.total <= 1: | |
# Definitely no duplicates if <= 1. | |
return True | |
if self.total != 2: # pragma: no cover | |
# Definitely duplicates if > 2. | |
return False | |
# Special case: Ok to create before and after same segment. | |
return self.create_before == 1 and self.create_after == 1 | |
Cases: | |
* 1 fix of any type: Valid | |
* 2 fixes: Valid if and only if types are create_before and create_after | |
""" | |
if self.total == 1: | |
# Definitely no duplicates if == 1. | |
return True | |
if self.total == 2: | |
# This is only OK for this special case: | |
return self.create_before == 1 and self.create_after == 1 | |
# Definitely duplicates if < 1 or > 2. | |
return False # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update it to something similar as you suggest.
As currently written, 0 will never occur because we only call is_valid
if there are fixes. But I'd prefer to treat 0 as valid because it's harmless and gives us a bit of future proofing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
src/sqlfluff/rules/L039.py
Outdated
if violations: | ||
# Check each violation. If any of its fixes uses the same anchor | ||
# as a previously returned fix, discard it. The linter can't handle | ||
# applying fixes like this. Skipping this issue is okay because it | ||
# will be detected and fixed during the next linter pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this code in L039? Feels like core code that should be in BaseRule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want BaseRule
to discard when multiple fixes have the same anchor, because we consider it an error (the is_valid
check).
L039 had duplicate anchors on this new test case (extracted from one of the .sql
fixtures):
test_excess_space_cast:
fail_str: |
select
'1' :: INT as id1,
'2'::int as id2
from table_a
fix_str: |
select
'1'::INT as id1,
'2'::int as id2
from table_a
L039 wants to make two fixes to the line '1' :: INT as id1,
.
- Replace excessively long whitespace with a single whitespace: " " -> " " (2 places)
- Entirely remove the whitespace around the
::
.
Thus, it's trying to both replace and delete the same whitespace. If we return both fixes, the core linter will (appropriately) complain and discard both fixes. This bookkeeping ensures that both get fixed, but it's split across two passes through the linter loop. There may be a smarter way to do this, but this approach seems reasonable. I'm trying really hard not to do big rewrites of existing rule code during these PRs -- the goal is to eliminate the critical errors but try and avoid going down a 🐰 hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks makes sense. Maybe add a comment saying, something like:
This rule works in two steps to remove unnecessary white space:
- Replace duplicate whitespace to one single whitespace
- Remove single white spaces if needed.
This can result in two delete being applied to same segment so area so check for that and replace with single delete.
); | ||
# Yes, the formatting looks bad, but that's because we're only running L053 | ||
# here. In the real world, other rules will tidy up the formatting. | ||
fix_str: "\n SELECT\n foo,\n bar,\n baz\n FROM mycte2\n;\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using \n
when that's not what's used for the fail_str
? Shouldn't it be consistent? Would also make the initial space look like "bad".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because of a YAML limitation. The fix string has a blank line at the end, but the YAML parser doesn't pick it up; it assumes blank line means end of string. The fail_str
doesn't have a blank line. Think I should change it? I prefer to use "normal" multi-line strings when possible, using quoted strings with \n
or other escape sequences only when necessary or for readability. Happy to change the fail_str
if you like, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go ahead and change fail_str
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
) | ||
# Yes, the formatting looks bad, but that's because we're only running L053 | ||
# here. In the real world, other rules will tidy up the formatting. | ||
fix_str: " -- This\n SELECT\n foo,\n bar,\n baz\n FROM mycte2\n\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@tunetheweb: Ready for another review. |
@alanmcruickshank: You may be interested in the changes to @OTooleMichael: You may be interested because it adds a new "sanity check" to the fixes. Previously, rules could return conflicting fixes -- it would apply one fix and silently ignore the rest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work @barrywhart ! Lots of clean up and should catch all these issues going forward.
Just going to suggest a rename of the PR for release notes to “Prevent rules incorrectly applying multiple fixes to same position.”
Brief summary of the change made
Fixes #2827
Changes:
Are there any other side effects of this change that we should be aware of?
Pull Request checklist
Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
.yml
rule test cases intest/fixtures/rules/std_rule_cases
..sql
/.yml
parser test cases intest/fixtures/dialects
(note YML files can be auto generated withtox -e generate-fixture-yml
).test/fixtures/linter/autofix
.Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.