Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libTooling] Fix getFileRangeForEdit for inner nested template types #87673

Merged
merged 3 commits into from
Apr 5, 2024

Conversation

tJener
Copy link
Contributor

@tJener tJener commented Apr 4, 2024

When there is >> in source from the right brackets of a nested template, the end location of the inner template points into a scratch space with a single > token. This prevents the lexer from seeing the >> token in the original source.

However, this causes the range to appear to be partially in a macro, and is problematic if we are trying to avoid ranges with any macro expansions.

This change detects this odd case in token ranges, converting it to the corresponding character range without the expansion.

When there is `>>` in source from the right brackets of a nested
template, the end location of the inner template points intto a
scratch space with a single `>` token. This prevents the lexer from
seeing the `>>` token in the original source.

However, this causes the range to appear to be partially in a macro,
and is problematic if we are trying to avoid ranges with any macro
expansions.

This change detects this odd case in token ranges, converting it to
the corresponding character range without the expansion.
@tJener tJener requested a review from ymand April 4, 2024 18:30
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Apr 4, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 4, 2024

@llvm/pr-subscribers-clang

Author: Eric Li (tJener)

Changes

When there is >> in source from the right brackets of a nested template, the end location of the inner template points intto a scratch space with a single > token. This prevents the lexer from seeing the >> token in the original source.

However, this causes the range to appear to be partially in a macro, and is problematic if we are trying to avoid ranges with any macro expansions.

This change detects this odd case in token ranges, converting it to the corresponding character range without the expansion.


Full diff: https://github.com/llvm/llvm-project/pull/87673.diff

2 Files Affected:

  • (modified) clang/lib/Tooling/Transformer/SourceCode.cpp (+56-5)
  • (modified) clang/unittests/Tooling/SourceCodeTest.cpp (+29)
diff --git a/clang/lib/Tooling/Transformer/SourceCode.cpp b/clang/lib/Tooling/Transformer/SourceCode.cpp
index 6aae834b0db563..114f6ba4833cb0 100644
--- a/clang/lib/Tooling/Transformer/SourceCode.cpp
+++ b/clang/lib/Tooling/Transformer/SourceCode.cpp
@@ -101,6 +101,56 @@ static bool spelledInMacroDefinition(SourceLocation Loc,
   return false;
 }
 
+// Returns the expansion loc of `Loc` if `Loc` is the location of the `>` token
+// of an inner nested template, where the inner and outer templates end brackets
+// are spelled as `>>`.
+//
+// Clang handles the `>>` in nested templates by placing the `SourceLocation`
+// of the inner template end bracket in scratch space. This forces it to be a
+// separate token (otherwise it would be lexed as `>>`), but that means it also
+// looks like a macro.
+static std::optional<SourceLocation> getExpansionForNestedTemplateGreater(
+    SourceLocation Loc, const SourceManager &SM, const LangOptions &LangOpts) {
+  if (Loc.isMacroID()) {
+    auto SpellingLoc = SM.getSpellingLoc(Loc);
+    auto ExpansionLoc = SM.getExpansionLoc(Loc);
+    Token SpellingTok, ExpansionTok;
+    if (Lexer::getRawToken(SpellingLoc, SpellingTok, SM, LangOpts,
+                           /*IgnoreWhiteSpace=*/false)) {
+      return std::nullopt;
+    }
+    if (Lexer::getRawToken(ExpansionLoc, ExpansionTok, SM, LangOpts,
+                           /*IgnoreWhiteSpace=*/false)) {
+      return std::nullopt;
+    }
+    if (SpellingTok.getKind() == tok::greater &&
+        ExpansionTok.getKind() == tok::greatergreater) {
+      return ExpansionLoc;
+    }
+  }
+  return std::nullopt;
+}
+
+// Returns `Range`, but adjusted to smooth over oddities introduced by Clang.
+static CharSourceRange
+cleanRangeForAvoidingMacros(CharSourceRange Range, const SourceManager &SM,
+                            const LangOptions &LangOpts) {
+  if (Range.isTokenRange()) {
+    auto B =
+        getExpansionForNestedTemplateGreater(Range.getBegin(), SM, LangOpts);
+    auto E = getExpansionForNestedTemplateGreater(Range.getEnd(), SM, LangOpts);
+    if (E) {
+      // We can't use the expansion location with a token range, because that
+      // will lex as `>>`. So we instead convert to a char range.
+      return CharSourceRange::getCharRange(B.value_or(Range.getBegin()),
+                                           E->getLocWithOffset(1));
+    } else if (B) {
+      return CharSourceRange::getTokenRange(*B, Range.getEnd());
+    }
+  }
+  return Range;
+}
+
 static CharSourceRange getRange(const CharSourceRange &EditRange,
                                 const SourceManager &SM,
                                 const LangOptions &LangOpts,
@@ -109,13 +159,14 @@ static CharSourceRange getRange(const CharSourceRange &EditRange,
   if (IncludeMacroExpansion) {
     Range = Lexer::makeFileCharRange(EditRange, SM, LangOpts);
   } else {
-    if (spelledInMacroDefinition(EditRange.getBegin(), SM) ||
-        spelledInMacroDefinition(EditRange.getEnd(), SM))
+    auto AdjustedRange = cleanRangeForAvoidingMacros(EditRange, SM, LangOpts);
+    if (spelledInMacroDefinition(AdjustedRange.getBegin(), SM) ||
+        spelledInMacroDefinition(AdjustedRange.getEnd(), SM))
       return {};
 
-    auto B = SM.getSpellingLoc(EditRange.getBegin());
-    auto E = SM.getSpellingLoc(EditRange.getEnd());
-    if (EditRange.isTokenRange())
+    auto B = SM.getSpellingLoc(AdjustedRange.getBegin());
+    auto E = SM.getSpellingLoc(AdjustedRange.getEnd());
+    if (AdjustedRange.isTokenRange())
       E = Lexer::getLocForEndOfToken(E, 0, SM, LangOpts);
     Range = CharSourceRange::getCharRange(B, E);
   }
diff --git a/clang/unittests/Tooling/SourceCodeTest.cpp b/clang/unittests/Tooling/SourceCodeTest.cpp
index 3641d2ee453f4a..def9701ccdf1e0 100644
--- a/clang/unittests/Tooling/SourceCodeTest.cpp
+++ b/clang/unittests/Tooling/SourceCodeTest.cpp
@@ -50,6 +50,15 @@ struct CallsVisitor : TestVisitor<CallsVisitor> {
   std::function<void(CallExpr *, ASTContext *Context)> OnCall;
 };
 
+struct TypeLocVisitor : TestVisitor<TypeLocVisitor> {
+  bool VisitTypeLoc(TypeLoc TL) {
+    OnCall(TL, Context);
+    return true;
+  }
+
+  std::function<void(TypeLoc, ASTContext *Context)> OnCall;
+};
+
 // Equality matcher for `clang::CharSourceRange`, which lacks `operator==`.
 MATCHER_P(EqualsRange, R, "") {
   return arg.isTokenRange() == R.isTokenRange() &&
@@ -510,6 +519,26 @@ int c = M3(3);
   Visitor.runOver(Code.code());
 }
 
+TEST(SourceCodeTest, InnerNestedTemplate) {
+  llvm::Annotations Code(R"cc(
+  template <typename T>
+  struct S {};
+
+  void f(S<S<S<int>>>);
+)cc");
+
+  TypeLocVisitor Visitor;
+  Visitor.OnCall = [](TypeLoc TL, ASTContext *Context) {
+    if (TL.getSourceRange().isInvalid())
+      return;
+    auto Range = CharSourceRange::getTokenRange(TL.getSourceRange());
+    EXPECT_TRUE(getFileRangeForEdit(Range, *Context,
+                                    /*IncludeMacroExpansion=*/false))
+        << TL.getSourceRange().printToString(Context->getSourceManager());
+  };
+  Visitor.runOver(Code.code(), TypeLocVisitor::Lang_CXX11);
+}
+
 TEST_P(GetFileRangeForEditTest, EditPartialMacroExpansionShouldFail) {
   std::string Code = R"cpp(
 #define BAR 10+

Copy link
Collaborator

@ymand ymand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work - really subtle! I'd been wondering how clang handles this issue (of relexing returning the wrong token because it lacked context).

clang/unittests/Tooling/SourceCodeTest.cpp Show resolved Hide resolved
@tJener tJener merged commit 345c482 into llvm:main Apr 5, 2024
3 of 4 checks passed
@tJener tJener deleted the fix-edit-range-nested-template branch April 5, 2024 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants