[clang-format] Fix more bugs in isStartOfName() #72336

owenca · 2023-11-15T03:26:23Z

Fixed #72264.

Fixed llvm#72264.

llvmbot · 2023-11-15T03:26:51Z

@llvm/pr-subscribers-clang-format

Author: Owen Pan (owenca)

Changes

Fixed #72264.

Full diff: https://github.com/llvm/llvm-project/pull/72336.diff

2 Files Affected:

(modified) clang/lib/Format/TokenAnnotator.cpp (+2-5)
(modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+14)

diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index 03f3c3583f2ec44..b71f6daa4e14392 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -2018,10 +2018,6 @@ class AnnotatingParser {
                (!Line.MightBeFunctionDecl || Current.NestingLevel != 0)) {
       Contexts.back().FirstStartOfName = &Current;
       Current.setType(TT_StartOfName);
-      if (auto *PrevNonComment = Current.getPreviousNonComment();
-          PrevNonComment && PrevNonComment->is(TT_StartOfName)) {
-        PrevNonComment->setType(TT_Unknown);
-      }
     } else if (Current.is(tok::semi)) {
       // Reset FirstStartOfName after finding a semicolon so that a for loop
       // with multiple increment statements is not confused with a for loop
@@ -2210,7 +2206,8 @@ class AnnotatingParser {
       return false;
 
     if (const auto *NextNonComment = Tok.getNextNonComment();
-        !NextNonComment || NextNonComment->isPointerOrReference()) {
+        !NextNonComment || NextNonComment->isPointerOrReference() ||
+        NextNonComment->isOneOf(tok::identifier, tok::string_literal)) {
       return false;
     }
 
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index ed730307074963f..1c0fe60c45c7a87 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -2373,6 +2373,20 @@ TEST_F(TokenAnnotatorTest, UnderstandsDoWhile) {
   EXPECT_TOKEN(Tokens[4], tok::kw_while, TT_DoWhile);
 }
 
+TEST_F(TokenAnnotatorTest, NotStartOfName) {
+  auto Tokens = annotate("#pragma clang diagnostic push");
+  ASSERT_EQ(Tokens.size(), 6u) << Tokens;
+  EXPECT_TOKEN(Tokens[2], tok::identifier, TT_Unknown);
+  EXPECT_TOKEN(Tokens[3], tok::identifier, TT_Unknown);
+  EXPECT_TOKEN(Tokens[4], tok::identifier, TT_Unknown);
+
+  Tokens = annotate("#pragma clang diagnostic ignored \"-Wzero-length-array\"");
+  ASSERT_EQ(Tokens.size(), 7u) << Tokens;
+  EXPECT_TOKEN(Tokens[2], tok::identifier, TT_Unknown);
+  EXPECT_TOKEN(Tokens[3], tok::identifier, TT_Unknown);
+  EXPECT_TOKEN(Tokens[4], tok::identifier, TT_Unknown);
+}
+
 } // namespace
 } // namespace format
 } // namespace clang

Fixed llvm#72264.

kadircet · 2024-01-03T06:56:36Z

hi @owenca looks like this is still regressing certain patterns, e.g:

int foo BAR;

annotations before your modifications to isStartOfName:

AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=int L=3 PPK=2 FakeLParens= FakeRParens=0 II=0x216ebe0 Text='int'
 M=0 C=1 T=StartOfName S=1 F=0 B=0 BK=0 P=220 Name=identifier L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x21a3268 Text='foo'
 M=0 C=1 T=StartOfName S=1 F=0 B=0 BK=0 P=130 Name=identifier L=11 PPK=2 FakeLParens= FakeRParens=0 II=0x21a3298 Text='BAR'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=23 Name=semi L=12 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'

new annotations:

AnnotatedTokens(L=0, P=0, T=5, C=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=int L=3 PPK=2 FakeLParens= FakeRParens=0 II=0x558b37655290 Text='int'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=identifier L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x558b3769d410 Text='foo'
 M=0 C=1 T=StartOfName S=1 F=0 B=0 BK=0 P=220 Name=identifier L=11 PPK=2 FakeLParens= FakeRParens=0 II=0x558b3769d440 Text='BAR'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=23 Name=semi L=12 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'

as you can see foo is no longer recognized as StartOfName, resulting in drastically different penalties for breaking around the token and changing formatting for a lot of existing code.

clang-format previously recognized these type of trailing attribute macros without any extra user configuration. it's quite obvious in 199fc97#diff-3f6f57cda9809a57c5b79e22b4181b3f3aaac7216262d0ef44108f39b0443e9bR8484, you're explicitly adding an attribute macro that wasn't explicitly told before (google style didn't have any pre-configured macros up until efeb546).

not sure if this was deliberate, but it's resulting in a lot of changes for our codebase and even independent of that not recognizing identifier foo as StartOfName seem clearly wrong to me. since we're close to release cut can you quickly remedy this regression that has started with efeb546 and accumulated more commits on top?

ilya-biryukov · 2024-01-03T10:20:44Z

I have played around with specializing new behavior to macro directives and it seems work well, I have sent #76804 for review.

[clang-format] Fix more bugs in isStartOfName()

832b6dc

Fixed llvm#72264.

llvmbot added the clang-format label Nov 15, 2023

HazardyKnusperkeks approved these changes Nov 15, 2023

View reviewed changes

owenca merged commit eaff083 into llvm:main Nov 15, 2023
4 checks passed

owenca deleted the 72264 branch November 15, 2023 22:31

zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023

[clang-format] Fix more bugs in isStartOfName() (llvm#72336)

f880176

Fixed llvm#72264.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clang-format] Fix more bugs in isStartOfName() #72336

[clang-format] Fix more bugs in isStartOfName() #72336

owenca commented Nov 15, 2023

llvmbot commented Nov 15, 2023

kadircet commented Jan 3, 2024

ilya-biryukov commented Jan 3, 2024

[clang-format] Fix more bugs in isStartOfName() #72336

[clang-format] Fix more bugs in isStartOfName() #72336

Conversation

owenca commented Nov 15, 2023

llvmbot commented Nov 15, 2023

kadircet commented Jan 3, 2024

ilya-biryukov commented Jan 3, 2024