Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang-format] Fix more bugs in isStartOfName() #72336

Merged
merged 1 commit into from
Nov 15, 2023
Merged

Conversation

owenca
Copy link
Contributor

@owenca owenca commented Nov 15, 2023

Fixed #72264.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 15, 2023

@llvm/pr-subscribers-clang-format

Author: Owen Pan (owenca)

Changes

Fixed #72264.


Full diff: https://github.com/llvm/llvm-project/pull/72336.diff

2 Files Affected:

  • (modified) clang/lib/Format/TokenAnnotator.cpp (+2-5)
  • (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+14)
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index 03f3c3583f2ec44..b71f6daa4e14392 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -2018,10 +2018,6 @@ class AnnotatingParser {
                (!Line.MightBeFunctionDecl || Current.NestingLevel != 0)) {
       Contexts.back().FirstStartOfName = &Current;
       Current.setType(TT_StartOfName);
-      if (auto *PrevNonComment = Current.getPreviousNonComment();
-          PrevNonComment && PrevNonComment->is(TT_StartOfName)) {
-        PrevNonComment->setType(TT_Unknown);
-      }
     } else if (Current.is(tok::semi)) {
       // Reset FirstStartOfName after finding a semicolon so that a for loop
       // with multiple increment statements is not confused with a for loop
@@ -2210,7 +2206,8 @@ class AnnotatingParser {
       return false;
 
     if (const auto *NextNonComment = Tok.getNextNonComment();
-        !NextNonComment || NextNonComment->isPointerOrReference()) {
+        !NextNonComment || NextNonComment->isPointerOrReference() ||
+        NextNonComment->isOneOf(tok::identifier, tok::string_literal)) {
       return false;
     }
 
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index ed730307074963f..1c0fe60c45c7a87 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -2373,6 +2373,20 @@ TEST_F(TokenAnnotatorTest, UnderstandsDoWhile) {
   EXPECT_TOKEN(Tokens[4], tok::kw_while, TT_DoWhile);
 }
 
+TEST_F(TokenAnnotatorTest, NotStartOfName) {
+  auto Tokens = annotate("#pragma clang diagnostic push");
+  ASSERT_EQ(Tokens.size(), 6u) << Tokens;
+  EXPECT_TOKEN(Tokens[2], tok::identifier, TT_Unknown);
+  EXPECT_TOKEN(Tokens[3], tok::identifier, TT_Unknown);
+  EXPECT_TOKEN(Tokens[4], tok::identifier, TT_Unknown);
+
+  Tokens = annotate("#pragma clang diagnostic ignored \"-Wzero-length-array\"");
+  ASSERT_EQ(Tokens.size(), 7u) << Tokens;
+  EXPECT_TOKEN(Tokens[2], tok::identifier, TT_Unknown);
+  EXPECT_TOKEN(Tokens[3], tok::identifier, TT_Unknown);
+  EXPECT_TOKEN(Tokens[4], tok::identifier, TT_Unknown);
+}
+
 } // namespace
 } // namespace format
 } // namespace clang

@owenca owenca merged commit eaff083 into llvm:main Nov 15, 2023
4 checks passed
@owenca owenca deleted the 72264 branch November 15, 2023 22:31
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
@kadircet
Copy link
Member

kadircet commented Jan 3, 2024

hi @owenca looks like this is still regressing certain patterns, e.g:

int foo BAR;

annotations before your modifications to isStartOfName:

AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=int L=3 PPK=2 FakeLParens= FakeRParens=0 II=0x216ebe0 Text='int'
 M=0 C=1 T=StartOfName S=1 F=0 B=0 BK=0 P=220 Name=identifier L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x21a3268 Text='foo'
 M=0 C=1 T=StartOfName S=1 F=0 B=0 BK=0 P=130 Name=identifier L=11 PPK=2 FakeLParens= FakeRParens=0 II=0x21a3298 Text='BAR'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=23 Name=semi L=12 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'

new annotations:

AnnotatedTokens(L=0, P=0, T=5, C=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=int L=3 PPK=2 FakeLParens= FakeRParens=0 II=0x558b37655290 Text='int'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=identifier L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x558b3769d410 Text='foo'
 M=0 C=1 T=StartOfName S=1 F=0 B=0 BK=0 P=220 Name=identifier L=11 PPK=2 FakeLParens= FakeRParens=0 II=0x558b3769d440 Text='BAR'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=23 Name=semi L=12 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'

as you can see foo is no longer recognized as StartOfName, resulting in drastically different penalties for breaking around the token and changing formatting for a lot of existing code.

clang-format previously recognized these type of trailing attribute macros without any extra user configuration. it's quite obvious in 199fc97#diff-3f6f57cda9809a57c5b79e22b4181b3f3aaac7216262d0ef44108f39b0443e9bR8484, you're explicitly adding an attribute macro that wasn't explicitly told before (google style didn't have any pre-configured macros up until efeb546).

not sure if this was deliberate, but it's resulting in a lot of changes for our codebase and even independent of that not recognizing identifier foo as StartOfName seem clearly wrong to me. since we're close to release cut can you quickly remedy this regression that has started with efeb546 and accumulated more commits on top?

@ilya-biryukov
Copy link
Contributor

I have played around with specializing new behavior to macro directives and it seems work well, I have sent #76804 for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[clang-format] Regression in clang-format-18
5 participants