Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CastExpr conversion function decl is sometimes corrupt #53044

Closed
kimgr opened this issue Jan 6, 2022 · 11 comments
Closed

CastExpr conversion function decl is sometimes corrupt #53044

kimgr opened this issue Jan 6, 2022 · 11 comments
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema"

Comments

@kimgr
Copy link
Contributor

kimgr commented Jan 6, 2022

This is a problem that surfaced in IWYU recently.

I'm pretty sure it's difficult to provoke it in the compiler proper, since CastExpr::getConversionFunction is only called in a single place, for diagnostics. Tools built on Clang tooling and the recursive AST visitor can fall victim to it, however.

I have only been able to reproduce it on plain source code (not preprocessed), and only in conjunction with the fmt library (https://github.com/fmtlib/fmt). Attached is a minimal RAV tool and instructions to reproduce.

What makes me think this is somewhat critical is that CastExpr::getConversionFunction returns an AccessSpecDecl in this scenario, which breaks all sorts of AST and LLVM RTTI invariants. I suspect we're looking at garbage data for some reason. I have tried building ubsan+asan-instrumented LLVM/Clang, but without being able to track anything down.

Repro project: ravrepro.tar.gz.

Thankful for any ideas, let me know if I can add more information.

References:

@davrec72
Copy link

davrec72 commented Jan 6, 2022

I believe the issue is with skipImplicitTemporary, called within CastExpr::getConversionFunction. The problematic SubExpr ends up as a ConstantExpr wrapping a CXXConstructExpr. ConstantExprs are implicit nodes added to help with constant evaluation, and so should be skipped here. I think adding the following at https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/Expr.cpp#L1910 should fix:

    // Skip ConstantExprs; they're implicit.
    if (auto *Constant = dyn_cast<ConstantExpr>(E))
      E = Constant->getSubExpr();

@kimgr
Copy link
Contributor Author

kimgr commented Jan 6, 2022

@drec357 Thank you!

I can confirm that inlining those two functions into

static NamedDecl *
my_get_conversion_function(CastExpr *expr) {
  // Skip implicit temporary stuffsies
  const Expr *SubExpr = nullptr;
  for (const CastExpr *E = expr; E; E = dyn_cast<ImplicitCastExpr>(SubExpr)) {
    // Skip through reference binding to temporary.
    if (auto *Materialize = dyn_cast<MaterializeTemporaryExpr>(E->getSubExpr()))
      SubExpr = Materialize->getSubExpr();
    else if (auto *Binder = dyn_cast<CXXBindTemporaryExpr>(E->getSubExpr()))
      SubExpr = Binder->getSubExpr();
    else if (auto *Constant = dyn_cast<ConstantExpr>(E->getSubExpr()))
      SubExpr = Constant->getSubExpr();
    else
      SubExpr = E->getSubExpr();

    if (E->getCastKind() == CK_ConstructorConversion)
      return cast<CXXConstructExpr>(SubExpr)->getConstructor();

    if (E->getCastKind() == CK_UserDefinedConversion) {
      if (auto *MCE = dyn_cast<CXXMemberCallExpr>(SubExpr))
        return MCE->getMethodDecl();
    }
  }
  return nullptr;
}

makes both c++17 and c++20 behave as expected.

I'm trying to really understand why we get a zero declkind, though...? Is it because without assertions, the LLVM RTTI just blindly reinterprets the ConstantExpr as a CXXConstructExpr?

@davrec72
Copy link

davrec72 commented Jan 7, 2022 via email

@kimgr
Copy link
Contributor Author

kimgr commented Jan 7, 2022

Right, makes sense. I was a bit surprised it got that far, but getConstructor really just returns a pointer, so with a little luck that u64 represents an address inside the ASTContext arena.

With that additional context, I've been able to reduce the problematic input to:

// nofmt.cc
struct X {
  consteval X(const char*) {
  }
};

void t() {
  X x = "foobar";
}

This can be provoked using

$ clang-14 -fsyntax-only -Xclang -ast-dump=json -std=c++20 nofmt.cc | grep -A10 "conversionFunc"
                      "conversionFunc": {
                        "id": "0x205e298",
                        "kind": "AccessSpecDecl"
                      },
                      "inner": [
                        {
                          "id": "0x205e2c8",
                          "kind": "ConstantExpr",
                          "range": {
                            "begin": {
                              "offset": 67,

The JSON AST dumper also calls getConversionFunction, and the smoking gun is the "kind": "AccessSpecDecl" line.

I'm not quite sure how to turn that into a lit test.

@kimgr
Copy link
Contributor Author

kimgr commented Jan 7, 2022

I suppose a case could be made for this to be a RecursiveASTVisitor unittest, too.

@kimgr
Copy link
Contributor Author

kimgr commented Jan 15, 2022

I have just uploaded a patch for this at https://reviews.llvm.org/D117391 (which depends on a clang-reformat in https://reviews.llvm.org/D117390). @drec357 I'll add you as reviewer, I hope you don't mind.

kimgr added a commit to kimgr/include-what-you-use that referenced this issue Feb 26, 2022
When a consteval function is called, its call-expression is wrapped in a
ConstantExpr in the Clang AST. There's an upstream bug [1] where Clang
doesn't account for that when searching the AST for a cast-expression's
potential user-defined conversion function, leading to an assertion
failure or a crash.

Reimplement GetConversionFunction so it works in this case. This can
safely be reverted when the upstream bug is fixed.

[1] llvm/llvm-project#53044
kimgr added a commit to kimgr/include-what-you-use that referenced this issue Feb 27, 2022
When a consteval function is called, its call-expression is wrapped in a
ConstantExpr in the Clang AST. There's an upstream bug [1] where Clang
doesn't account for that when searching the AST for a cast-expression's
potential user-defined conversion function, leading to an assertion
failure or a crash.

Reimplement GetConversionFunction so it works in this case. This can
safely be reverted when the upstream bug is fixed.

[1] llvm/llvm-project#53044
kimgr added a commit to include-what-you-use/include-what-you-use that referenced this issue Feb 27, 2022
When a consteval function is called, its call-expression is wrapped in a
ConstantExpr in the Clang AST. There's an upstream bug [1] where Clang
doesn't account for that when searching the AST for a cast-expression's
potential user-defined conversion function, leading to an assertion
failure or a crash.

Reimplement GetConversionFunction so it works in this case. This can
safely be reverted when the upstream bug is fixed.

[1] llvm/llvm-project#53044
AaronBallman added a commit that referenced this issue Mar 21, 2022
Full-expressions are Sema-generated implicit nodes that cover
constant-expressions and expressions-with-cleanup for temporaries.

Ignore those as part of implicit-ignore, and also remove too-aggressive
IgnoreImplicit (which includes nested ImplicitCastExprs, for example)
on unpacked sub-expressions.

Add some unittests to demonstrate that RecursiveASTVisitor sees through
ConstantExpr nodes correctly.

Adjust cxx2a-consteval test to cover diagnostics for nested consteval
expressions that were previously missed.

Fixes bug #53044.
@kimgr
Copy link
Contributor Author

kimgr commented Mar 21, 2022

@AaronBallman Thanks for helping land this! I'd like to reintegrate and double-check this with include-what-you-use built against llvm's mainline before closing. I'll report back here as soon as I've confirmed.

@EugeneZelenko EugeneZelenko added clang:frontend Language frontend issues, e.g. anything involving "Sema" and removed new issue labels Mar 21, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 21, 2022

@llvm/issue-subscribers-clang-frontend

@kimgr
Copy link
Contributor Author

kimgr commented Mar 22, 2022

Issue is indeed fixed as of 403d7d8. Is there a procedure for closing, or should I just hit the button?

@AaronBallman
Copy link
Collaborator

The procedure is to link to the git hash for the commit with the fix and close. I'll let you do the honors for this one given it was your fix (thanks again!). :-)

@kimgr
Copy link
Contributor Author

kimgr commented Mar 22, 2022

Thanks, closing!

@kimgr kimgr closed this as completed Mar 22, 2022
huizongsong added a commit to huizongsong/include-what-you-use that referenced this issue Aug 26, 2022
When a consteval function is called, its call-expression is wrapped in a
ConstantExpr in the Clang AST. There's an upstream bug [1] where Clang
doesn't account for that when searching the AST for a cast-expression's
potential user-defined conversion function, leading to an assertion
failure or a crash.

Reimplement GetConversionFunction so it works in this case. This can
safely be reverted when the upstream bug is fixed.

[1] llvm/llvm-project#53044
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema"
Projects
None yet
Development

No branches or pull requests

5 participants