Prevent rules incorrectly returning conflicting fixes to same position #2830

barrywhart · 2022-03-09T19:16:31Z

Brief summary of the change made

Fixes #2827

Changes:

Fixes L052 (as noted in the issue)
Fixes L036, L050,L053 (which had other issues uncovered by CI or internal linter checks)
Changes core linter behavior for rules returning multiple fixes with same anchor segment.
- Old behavior: Arbitrarily applies one of the fixes, silently discards other fixes with the same anchor. Note this means fixes are not atomic -- bad!!
- New behavior: Logs a warning and ignores the whole set of fixes. (During automated tests, raises an error rather than logging.)

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
- .yml rule test cases in test/fixtures/rules/std_rule_cases.
- .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
- Full autofix test cases in test/fixtures/linter/autofix.
- Other.
Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.

barrywhart · 2022-03-09T19:17:23Z

src/sqlfluff/core/linter/linter.py

+                            f"the same anchors. This is not supported, so the "
+                            f"fixes will not be applied. %r",
+                            fixes,
+                        )  # pragma: no cover


New error detection code.

barrywhart · 2022-03-09T19:17:45Z

src/sqlfluff/core/parser/segments/base.py

@@ -1053,7 +1053,6 @@ def apply_fixes(self, dialect, rule_code, fixes):
                                )
                            # We've applied a fix here. Move on, this also consumes the
                            # fix
-                            # TODO: Maybe deal with overlapping fixes later.


Delete this comment now that we're dealing with them (by warning and discarding)

barrywhart · 2022-03-09T19:36:08Z

src/sqlfluff/rules/L052.py

+            [
+                SymbolSegment(raw=";", type="symbol", name="semicolon"),
+            ],
+        )


The originally reported bug was here.

barrywhart · 2022-03-09T19:36:57Z

src/sqlfluff/rules/L052.py

+                        NewlineSegment(),
+                        SymbolSegment(raw=";", type="symbol", name="semicolon"),
+                    ],
+                )


This code was practically identical to the buggy code. I refactored it to use the same helper function, thus it gets the same fix.

barrywhart · 2022-03-09T19:38:00Z

src/sqlfluff/rules/L052.py

+        if anchor_segment in whitespace_deletions:
+            # Can't delete() and create_after() the same segment. Use replace()
+            # instead.
+            lintfix_fn = LintFix.replace
+            whitespace_deletions = whitespace_deletions.select(
+                lambda seg: seg is not anchor_segment
+            )


This is the heart of the fix. It prevents having two LintFixes with the same anchor.

WittierDinosaur · 2022-03-09T19:55:42Z

New error seems to have blown up L036? Maybe more underlying bugs?

barrywhart · 2022-03-09T19:58:13Z

Yes, L036 has a bug as well. Interesting that we hadn't noticed it before. I'll try and fix that in the same PR, as long as it's not a big nasty fix.

barrywhart · 2022-03-09T20:05:21Z

Looking at one of the L036 failures, it is indeed returning multiple fixes with the same anchor. In this case, they are identical (both are deletes). That's why we haven't noticed previously. (If the fixes were different, we likely would've noticed, as with L052.) Should be a simple fix. 🤞

barrywhart · 2022-03-09T20:34:01Z

Ok, I fixed the L036 bug. It appears L001 has a "duplicate anchors" bug as well!

tunetheweb · 2022-03-09T20:37:34Z

Ok, I fixed the L036 bug. It appears L001 has a "duplicate anchors" bug as well!

barrywhart · 2022-03-09T21:00:38Z

It turns out L001 was okay. In one test, it was returning two deletions with segments that had the same position info, but were different objects. I updated the PR to use a new class, IdentitySet, that checks for membership by object identity rather than equality.

codecov · 2022-03-09T21:14:28Z

Codecov Report

Merging #2830 (808d92d) into main (e94005e) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main     #2830   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          163       163           
  Lines        12336     12419   +83     
=========================================
+ Hits         12336     12419   +83

Impacted Files	Coverage Δ
src/sqlfluff/core/linter/linter.py	`100.00% <100.00%> (ø)`
src/sqlfluff/core/parser/segments/base.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L036.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L039.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L050.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L052.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L053.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e94005e...808d92d. Read the comment docs.

barrywhart · 2022-03-09T21:49:57Z

@tunetheweb, @WittierDinosaur: Ok, ready for review!!

barrywhart · 2022-03-10T14:52:06Z

src/sqlfluff/core/parser/segments/base.py

-                        f = fix_buff.pop()
-                        # Look for identity not just equality.
-                        # This handles potential positioning ambiguity.
-                        if f.anchor is seg:


Below, I reworked the core apply_fixes() logic:

Consumes a dictionary of AnchorEditInfo rather than a list of fixes

Add the ability to handle create_before and create_after the same anchor (new requirement for L053)

Bonus: The new logic is more straightforward (dictionary lookup/removal versus making copies of lists and moving things between them). It's probably more efficient as well -- the old logic was scanning all the unused fixes each time, to try and match it against the current segment.

barrywhart · 2022-03-10T14:58:47Z

src/sqlfluff/core/parser/segments/base.py

                    else:
                        seg_buffer.append(seg)
-                # Switch over the the unused list
-                fixes = unused_fixes + fix_buff


Note how this tricky "end of loop" logic all goes away now that we're using a dictionary instead.

barrywhart · 2022-03-10T15:23:34Z

src/sqlfluff/rules/L036.py

+                        fixes_ += [
+                            LintFix.delete(seg)
+                            for seg in move_after_select_clause
+                            if seg not in all_deletes


@tunetheweb: I'd like your thoughts on a question. This PR adds a new feature that prohibits fixes with the same anchor with one exception: It's okay to have 2 fixes, one create_before and one create_after.

Should we also allow multiple deletes of the same segment? There's no ambiguity there, and it would avoid needing to "fix" this rule as well as L053.

It's kind of a philosophical question. If we decide to allow multiple deletes, we make it easier on rule writers. On the other hand, we're letting them be a bit sloppy. 🤷‍♂️ I could be convinced either way.

Hmmm.... I think safer to not allow that. In theory it shouldn't be needed so the developer ease is not a strong enough argument in my mind. And curious why L036 currently does it? Looked at the code but wasn't immediately apparent to me.

There's not a "good" reason, and I am not that familiar with L036 details, but basically, there are a couple places that delete unnecessary whitespace, and there was no bookkeeping, so in some cases, the same whitespace gets deleted twice.

Happy to leave the fix checker "as is" for now (i.e. not allow multiple deletes).

barrywhart · 2022-03-10T15:24:33Z

src/sqlfluff/rules/L050.py

+        # whitespace multiple times (i.e. for non-raw segments higher in the
+        # tree).
+        if not context.segment.is_raw():
+            return None


If we allowed multiple deletions, we wouldn't need this change.

tunetheweb

Still missing some coverage too.

tunetheweb · 2022-03-10T19:22:10Z

src/sqlfluff/core/linter/linter.py

+                    if any(
+                        not info.is_valid for info in anchor_info.values()
+                    ):  # pragma: no cover
+                        message = (
+                            f"Rule {crawler.code} returned multiple fixes with the "
+                            f"same anchor. This is only supported for create_before+"
+                            f"create_after, so the fixes will not be applied. {fixes!r}"
+                        )
+                        cls._report_duplicate_anchors_error(message)
+                    elif fixes == last_fixes:  # pragma: no cover
                        cls._warn_unfixable(crawler.code)
                    else:


Can we add a comment explaining what each of these three parts are for?

First one I think it covered by the error message.
Second is what? When there are errors but no fixes?
Third is what the good case? We have fixes and they look valid?

tunetheweb · 2022-03-10T19:58:04Z

src/sqlfluff/core/parser/segments/base.py

+        Cases:
+        * 1 fix of any type: Valid
+        * 2 fixes: Valid if and only if types are create_before and create_after
+        """
+        if self.total <= 1:
+            # Definitely no duplicates if <= 1.
+            return True
+        if self.total != 2:  # pragma: no cover
+            # Definitely duplicates if > 2.
+            return False
+        # Special case: Ok to create before and after same segment.
+        return self.create_before == 1 and self.create_after == 1


This reads confusingly. I think below is clearer.

Also should first case only allow == 1? When would it be 0? Or less than 0? Should that instead fall through to the False case?

Suggested change

Cases:

* 1 fix of any type: Valid

* 2 fixes: Valid if and only if types are create_before and create_after

"""

if self.total <= 1:

# Definitely no duplicates if <= 1.

return True

if self.total != 2: # pragma: no cover

# Definitely duplicates if > 2.

return False

# Special case: Ok to create before and after same segment.

return self.create_before == 1 and self.create_after == 1

Cases:

* 1 fix of any type: Valid

* 2 fixes: Valid if and only if types are create_before and create_after

"""

if self.total == 1:

# Definitely no duplicates if == 1.

return True

if self.total == 2:

# This is only OK for this special case:

return self.create_before == 1 and self.create_after == 1

# Definitely duplicates if < 1 or > 2.

return False # pragma: no cover

I'll update it to something similar as you suggest.

As currently written, 0 will never occur because we only call is_valid if there are fixes. But I'd prefer to treat 0 as valid because it's harmless and gives us a bit of future proofing.

tunetheweb · 2022-03-10T20:00:49Z

src/sqlfluff/rules/L039.py

+        if violations:
+            # Check each violation. If any of its fixes uses the same anchor
+            # as a previously returned fix, discard it. The linter can't handle
+            # applying fixes like this. Skipping this issue is okay because it
+            # will be detected and fixed during the next linter pass.


Why is this code in L039? Feels like core code that should be in BaseRule.

We don't want BaseRule to discard when multiple fixes have the same anchor, because we consider it an error (the is_valid check).

L039 had duplicate anchors on this new test case (extracted from one of the .sql fixtures):

test_excess_space_cast: fail_str: | select '1' :: INT as id1, '2'::int as id2 from table_a fix_str: | select '1'::INT as id1, '2'::int as id2 from table_a

L039 wants to make two fixes to the line '1' :: INT as id1,.

Replace excessively long whitespace with a single whitespace: " " -> " " (2 places)

Entirely remove the whitespace around the ::.

Thus, it's trying to both replace and delete the same whitespace. If we return both fixes, the core linter will (appropriately) complain and discard both fixes. This bookkeeping ensures that both get fixed, but it's split across two passes through the linter loop. There may be a smarter way to do this, but this approach seems reasonable. I'm trying really hard not to do big rewrites of existing rule code during these PRs -- the goal is to eliminate the critical errors but try and avoid going down a 🐰 hole.

Thanks makes sense. Maybe add a comment saying, something like:

This rule works in two steps to remove unnecessary white space:

Replace duplicate whitespace to one single whitespace

Remove single white spaces if needed.

This can result in two delete being applied to same segment so area so check for that and replace with single delete.

src/sqlfluff/rules/L052.py

tunetheweb · 2022-03-10T20:05:14Z

test/fixtures/rules/std_rule_cases/L053.yml

+    );
+  # Yes, the formatting looks bad, but that's because we're only running L053
+  # here. In the real world, other rules will tidy up the formatting.
+  fix_str: "\n    SELECT\n        foo,\n        bar,\n        baz\n    FROM mycte2\n;\n"


Why using \n when that's not what's used for the fail_str? Shouldn't it be consistent? Would also make the initial space look like "bad".

It's because of a YAML limitation. The fix string has a blank line at the end, but the YAML parser doesn't pick it up; it assumes blank line means end of string. The fail_str doesn't have a blank line. Think I should change it? I prefer to use "normal" multi-line strings when possible, using quoted strings with \n or other escape sequences only when necessary or for readability. Happy to change the fail_str if you like, though!

I'll go ahead and change fail_str.

tunetheweb · 2022-03-10T20:05:20Z

test/fixtures/rules/std_rule_cases/L053.yml

+    )
+  # Yes, the formatting looks bad, but that's because we're only running L053
+  # here. In the real world, other rules will tidy up the formatting.
+  fix_str: " -- This\n    SELECT\n        foo,\n        bar,\n        baz\n    FROM mycte2\n\n"


barrywhart · 2022-03-10T21:42:27Z

@tunetheweb: Ready for another review.

barrywhart · 2022-03-10T21:45:45Z

@alanmcruickshank: You may be interested in the changes to BaseSegment.apply_fixes(). We now handle one case of "overlapping fixes": two fixes with the same anchor, one is create_before and one is create_after.

@OTooleMichael: You may be interested because it adds a new "sanity check" to the fixes. Previously, rules could return conflicting fixes -- it would apply one fix and silently ignore the rest.

tunetheweb

This is great work @barrywhart ! Lots of clean up and should catch all these issues going forward.

Just going to suggest a rename of the PR for release notes to “Prevent rules incorrectly applying multiple fixes to same position.”

Fix L052 bug deleting space after Snowflake SET statement

4de2882

barrywhart marked this pull request as draft March 9, 2022 19:16

barrywhart commented Mar 9, 2022

View reviewed changes

Make a similar fix elsewhere in L052, DRY up the code

65a95b0

barrywhart commented Mar 9, 2022

View reviewed changes

barrywhart marked this pull request as ready for review March 9, 2022 19:38

barrywhart requested review from tunetheweb and WittierDinosaur March 9, 2022 19:38

barrywhart marked this pull request as draft March 9, 2022 19:43

Update tests to raise error on fixes with duplicate anchors

e51aa4b

Merge branch 'main' into bhart-issue_2827_l052_duplicate_anchors

9e8a5f0

Fix L036 bug: Deleting the same segment multiple times

89dbc95

Rework the fixes to use object *identity*, not *equality*

bf87ff9

barrywhart changed the title ~~Fix L052 bug deleting space after Snowflake SET statement~~ Detect when a rule returns multiple fixes with same anchor, fix L036 and L052 Mar 9, 2022

Barry Hart added 2 commits March 9, 2022 15:55

Comments

12a9383

Update L052 to use IdentitySet

119400d

barrywhart marked this pull request as ready for review March 9, 2022 21:01

Coverage checker

c75c1bd

barrywhart changed the title ~~Detect when a rule returns multiple fixes with same anchor, fix L036 and L052~~ Warn on rules returning multiple fixes with same anchor; fix L036 and L052 Mar 9, 2022

Barry Hart added 4 commits March 9, 2022 20:00

Update L039 to avoid returning multiple fixes with same anchors

6c4b0fa

Add test case for fixed L039 issue

dfd16b0

Fix one of the post-fix parse error bugs in L053

bc2b727

Fix L053 bug, allow create_before + create_after for same anchor

b6435e7

barrywhart changed the title ~~Warn on rules returning multiple fixes with same anchor; fix L036 and L052~~ Log critical message on rules returning multiple fixes with same anchor (unless 1 each of create_before+create_after); fix L036, L052, L053 Mar 10, 2022

barrywhart commented Mar 10, 2022

View reviewed changes

Barry Hart added 2 commits March 10, 2022 09:53

Tweaks

49db007

Fix type annotation

d736c60

barrywhart commented Mar 10, 2022

View reviewed changes

Comments, tweaks

4fc8f8d

barrywhart marked this pull request as draft March 10, 2022 15:07

Fix bug in L050, fix missing space in message

8aa9170

barrywhart marked this pull request as ready for review March 10, 2022 15:16

barrywhart commented Mar 10, 2022

View reviewed changes

Barry Hart added 2 commits March 10, 2022 10:25

Remove commented-out code

76f8ba1

Add "pragma: no cover"

6548269

tunetheweb requested changes Mar 10, 2022

View reviewed changes

Barry Hart added 3 commits March 10, 2022 15:57

Add "pragma: no cover"

dfdd131

PR review

ffd746c

More L039 comments

38a0b0e

barrywhart requested a review from alanmcruickshank March 10, 2022 21:42

tunetheweb approved these changes Mar 10, 2022

View reviewed changes

Discard fixes from lint results if tehre are conflicts

808d92d

barrywhart changed the title ~~Log critical message on rules returning multiple fixes with same anchor (unless 1 each of create_before+create_after); fix L036, L050, L052, L053~~ Prevent rules incorrectly returning conflicting fixes to same position Mar 10, 2022

barrywhart merged commit 0e91f42 into sqlfluff:main Mar 10, 2022

tunetheweb mentioned this pull request Mar 11, 2022

L042 Autofix - (CTE hoisting) Super Working PR #2795

Merged

9 tasks

Prevent rules incorrectly returning conflicting fixes to same position #2830

Prevent rules incorrectly returning conflicting fixes to same position #2830

Conversation

barrywhart commented Mar 9, 2022 • edited

Brief summary of the change made

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barrywhart Mar 9, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WittierDinosaur commented Mar 9, 2022

barrywhart commented Mar 9, 2022

barrywhart commented Mar 9, 2022

barrywhart commented Mar 9, 2022

tunetheweb commented Mar 9, 2022

barrywhart commented Mar 9, 2022

codecov bot commented Mar 9, 2022 • edited

Codecov Report

barrywhart commented Mar 9, 2022

barrywhart Mar 10, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barrywhart Mar 10, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tunetheweb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barrywhart commented Mar 10, 2022

barrywhart commented Mar 10, 2022

tunetheweb left a comment

Choose a reason for hiding this comment

barrywhart commented Mar 9, 2022 •

edited

barrywhart Mar 9, 2022 •

edited

codecov bot commented Mar 9, 2022 •

edited

barrywhart Mar 10, 2022 •

edited

barrywhart Mar 10, 2022 •

edited