Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang] Fix self-capturing __block variables #89475

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ille-apple
Copy link

This is an updated version of https://reviews.llvm.org/D90434 from 2020. Due to time constraints I stopped working on the patch, and unfortunately never got back around to it until now.

Clang special-cases __block variables whose initializer contains a block literal that captures the variable itself. But this special case doesn't work for variables of record types, causing Clang to emit broken code.

Fix this by initializing such variables directly on the heap, and add a warning for when this needs to be done.

Background on the special case

__block variables are unique in that their address can change during the lifetime of the variable. Each __block variable starts on the stack. But when a block is retained using Block_copy, both the block itself and any __block variables it captures are moved to the heap.

What's tricky is when this move happens while the variable is still being initialized. For example:

int copyBlockAndReturnInt(void (^block)(void)) {
    Block_copy(block);
    return 42;
}

int main() {
  __block int val = copyBlockAndReturnInt(^{
      printf("%d\n", val);
  });
}

Here, when Block_copy is called, val is moved to the heap while still in an uninitialized state. Then when copyBlockAndReturnInt returns, main needs to store the return value to the variable's new location, not to the original stack slot.

This can only happen if a block literal that captures the variable is located syntactically within the initializer itself. If it's before the initializer then it couldn't capture the variable, and if it's after the initializer then it couldn't execute during initialization.

So Clang detects when such a literal exists inside the initializer; CGDecl.cpp refers to this condition as capturedByInit. If it does, then the initialization code operates in a special mode. In several functions in that file, there's an LValue object and an accompanying capturedByInit flag. If capturedByInit is false (the typical case), then the lvalue points to the object being initialized. But if it's true, the lvalue points to the byref header. Then, depending on the type of the variable, you end up at one of many checks of the form:

if (capturedByInit)
  drillIntoBlockVariable(*this, lvalue, cast<VarDecl>(D));

drillIntoBlockVariable emits a load of the forwarding pointer from the byref header (to find the object's current location), and then modifies lvalue to point to the object, unifying the two cases. Importantly, this happens at the last minute: after emitting code to evaluate the initializer, but before emitting the store instruction to write the evaluated value into the variable.

Okay, but then why are there two different code paths? If the address calculation can be safely deferred, why not just do that all the time rather than having a separate mode just for self-capturing __block variables?

Because it can't always be safely deferred.

Harder cases

First, consider the case where variable being initialized is a C struct. This is already problematic. drillIntoBlockVariable is supposed to be called after evaluating the initializer but before doing a store. But when initializing a struct, Clang doesn't evaluate the whole initializer before storing. Instead it evaluates the first field, stores the first field, evaluates the second field, stores the second field, and so on. This is actually necessary for correctness1 given code like:

struct Foo { int a, b; };
struct Foo foo = {1, foo.a};

But if any one of the field evaluations can move the object, then Clang would need to reload the forwarding pointer before storing each field. And that would need to apply recursively to any subfields. Such a scheme is definitely possible, but Clang doesn't implement it.

And if instead of a C struct you have a C++ class, supporting relocation to the heap becomes flat-out impossible. Consider:

struct Foo {
    void (^block_)(void);
    Foo(void (^block)(void)) {
        printf("I am at %p\n", this);
        block_ = Block_copy(block);
        printf("I am now at %p\n", this); // shouldn't change!
    }
};

__block Foo foo(^{
    printf("%p\n", &foo);
});

C++ constructors can observe the address of the object being constructed, and expect that address not to change. Once the object has finished being constructed, it can be relocated by calling the move or copy constructor, and that's what Clang uses for block captures in the non-self-capturing case. But in this case, Block_copy is called while the stack object is still running its constructor. It would be nonsensical to try to move out of it in this state.

Clang's current behavior

So what does Clang currently do? Well, for both C structs and C++ classes (as well as unions2), it runs into this TODO dating to 2011:

case TEK_Aggregate:
  [...]
    // TODO: how can we delay here if D is captured by its initializer?

Clang simply neglects to call drillIntoBlockVariable and falls into its normal initialization code. That means the lvalue is initialized as if it pointed to the object, when it actually points to the byref header. As such, the generated code is non-functional. And this is true for all __block variables of record type that Clang detects as self-capturing, even if the initializer never actually calls Block_copy.

Solution

As stated, it's impossible for self-capturing variables of at least some record types to support moving to the heap. But there is a way to make them work regardless. Any given __block variable can move at most once, from the stack to the heap. Once on the heap, it never needs to move again. So, as suggested by @rjmccall in 2020, we can create a heap allocation before initialization even starts, and then initialize the object directly on the heap, knowing its address will remain stable.

This patch applies this approach to all self-capturing __block variables with record type, i.e. all variables that previously triggered the broken code generation.

The obvious downside of the approach that we pay the cost of a heap allocation even if Block_copy is never called. Conveniently, this can never be a performance regression per se, since the affected cases didn't work at all before. Still, there is room for future improvement, such as by making mid-initialization relocation work for C structs3, or detecting initializers that do contain a block literal but can't possibly call Block_copy on it.

To assist performance-sensitive codebases, this patch adds a warning, -Wimplicit-block-var-alloc, that triggers whenever a block requires an implicit allocation. Since most codebases probably don't care, this warning is disabled by default and is not included in any standard warning groups.

Allocation failure

Another downside of the approach is that because the heap allocation is completely implicit, there is no way to check for allocation failure. Normally, heap allocation happens only when you call Block_copy, which means you can catch allocation failure by checking whether Block_copy returns NULL.

...Or so I thought. It turns out that checking Block_copy's return value is only a half-measure. Block_copy does return NULL if there's an allocation failure when moving the block itself to the heap. But then Block_copy calls into the copy helper to move captured variables to the heap, which can also fail. If it does, the copy helper has no way to report that failure. Instead you just get a crash (which should probably be a proper abort; Apple folks, see rdar://126632757). This limitation is more or less baked into the ABI, since copy helpers return void. Even xnu, which uses blocks in kernel space, does not support allocation failure for them - though, luckily, xnu's version waits for memory to be available rather than crashing.

The point is, if recovery from allocation failure is already not a thing, then this patch isn't responsible for breaking it. Even if blocks are changed someday to support it, it would probably be desired only by a tiny minority of codebases, and those codebases could avoid implicit allocations by enabling the new warning and setting it as an error.

ABI limitation

The blocks runtime has no API to allocate a byref struct directly on the heap, because it's never been necessary before. There is only _Block_object_assign to move one from the stack to the heap. It would be nice to add an API to allocate directly on the heap, but for the dual sakes of backward compatibility and laziness, I have not attempted to do so.

Instead, the codegen works with the existing runtime, as follows. First, a byref struct is allocated on the stack (including space for the object), but only the header is initialized; the object part is left uninitialized. Then _Block_object_assign is called to migrate it to the heap (which unfortunately pays the cost of memcpying the uninitialized stack memory to the heap). This produces the desired heap allocation, but also increases the reference count, which is balanced by immediately calling _Block_object_dispose. After that, the object can finally be initialized inside the new heap allocation.

Improved self-capture analysis

The existing check for self-capturing __block variables is implemented in the function isCapturedBy. But it turns out that that implementation is overcomplicated for no good reason. In ancient days (2011), isCapturedBy only worked on Exprs, and it recursed on the Expr's children, which it assumed would also be Exprs. This assumption turned out to be false for GNU statement expressions. The simple fix would have been to change isCapturedBy to accept and recurse on arbitrary Stmts instead, since children() is defined for all Stmts, not just Exprs. I'm not sure if there was some reason that wasn't possible at the time, but what happened instead is that a special case was added for statement expressions, which over a series of patches got more and more convoluted.

This patch replaces the whole thing with an EvaluatedExprVisitor (another suggestion from @rjmccall). This is not only simpler, but also more precise, eliminating false positive self-captures including:

  • Block literals in unevaluated contexts (sizeof(^{ ... })).

  • Some types of statements inside GNU statement expressions.

This patch also moves the self-capture check earlier in the compilation process, from CodeGen to Sema. This is necessary in order to support the aforementioned warning.

Constant initialization

This patch also fixes a related bug where self-capturing __block variables could trigger this assert:

assert(!capturedByInit && "constant init contains a capturing block?");

The assert is important, because constant initialization doesn't support the drillIntoBlockVariable mechanism. Now, a constant initializer obviously can't include a call to Block_copy; in fact, it can't include block literals at all unless they have no captures. But I found two cases where the assert can be triggered anyway.

First, the initializer can syntactically contain a capturing block literal but never evaluate it:

__block constexpr int foo = true ? 1 : ^{ return foo; }();

This is addressed in this patch by forcing constant-initialized variables to be treated as non-self-capturing.

Second, the initializer can simply not be constant. With variables of types like const int, Clang first tries to constant-evaluate the initializer, then falls back to runtime initialization if that fails. But the assert is located in a place where it triggers just based on the try. This is addressed in this patch by moving the assert down and conditioning it on constant evaluation succeeding.

Footnotes

  1. Supporting such code appears to be required by the C++ spec (see
    https://timsong-cpp.github.io/cppwp/n4861/dcl.init.aggr#6), though not
    by the C spec (see https://port70.net/~nsz/c/c11/n1570.html#6.7.9p23).

  2. There are other TEK_Aggregate types besides records, but all such
    types either can't be captured by blocks (arrays) or can't be the
    type of a local variable at all.

  3. Though even in C, moving mid-initializer could break code that took the address of the variable:

    struct Foo { void *a; void *(^b)(void); };
    __block struct Foo x = {&x.b, Block_copy(^{ return x.a; })};
    assert(x.a == &x.b); // fails without up-front heap allocation
    

    However, it's a bit less obvious that such code needs to work; perhaps the user should just be aware the variable will be moving. It's not like C++ where construction is clearly an indivisible operation performed at a single address.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:codegen labels Apr 20, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 20, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: None (ille-apple)

Changes

This is an updated version of https://reviews.llvm.org/D90434 from 2020. Due to time constraints I stopped working on the patch, and unfortunately never got back around to it until now.

Clang special-cases __block variables whose initializer contains a block literal that captures the variable itself. But this special case doesn't work for variables of record types, causing Clang to emit broken code.

Fix this by initializing such variables directly on the heap, and add a warning for when this needs to be done.

Background on the special case

__block variables are unique in that their address can change during the lifetime of the variable. Each __block variable starts on the stack. But when a block is retained using Block_copy, both the block itself and any __block variables it captures are moved to the heap.

What's tricky is when this move happens while the variable is still being initialized. For example:

int copyBlockAndReturnInt(void (^block)(void)) {
    Block_copy(block);
    return 42;
}

int main() {
  __block int val = copyBlockAndReturnInt(^{
      printf("%d\n", val);
  });
}

Here, when Block_copy is called, val is moved to the heap while still in an uninitialized state. Then when copyBlockAndReturnInt returns, main needs to store the return value to the variable's new location, not to the original stack slot.

This can only happen if a block literal that captures the variable is located syntactically within the initializer itself. If it's before the initializer then it couldn't capture the variable, and if it's after the initializer then it couldn't execute during initialization.

So Clang detects when such a literal exists inside the initializer; CGDecl.cpp refers to this condition as capturedByInit. If it does, then the initialization code operates in a special mode. In several functions in that file, there's an LValue object and an accompanying capturedByInit flag. If capturedByInit is false (the typical case), then the lvalue points to the object being initialized. But if it's true, the lvalue points to the byref header. Then, depending on the type of the variable, you end up at one of many checks of the form:

if (capturedByInit)
  drillIntoBlockVariable(*this, lvalue, cast&lt;VarDecl&gt;(D));

drillIntoBlockVariable emits a load of the forwarding pointer from the byref header (to find the object's current location), and then modifies lvalue to point to the object, unifying the two cases. Importantly, this happens at the last minute: after emitting code to evaluate the initializer, but before emitting the store instruction to write the evaluated value into the variable.

Okay, but then why are there two different code paths? If the address calculation can be safely deferred, why not just do that all the time rather than having a separate mode just for self-capturing __block variables?

Because it can't always be safely deferred.

Harder cases

First, consider the case where variable being initialized is a C struct. This is already problematic. drillIntoBlockVariable is supposed to be called after evaluating the initializer but before doing a store. But when initializing a struct, Clang doesn't evaluate the whole initializer before storing. Instead it evaluates the first field, stores the first field, evaluates the second field, stores the second field, and so on. This is actually necessary for correctness1 given code like:

struct Foo { int a, b; };
struct Foo foo = {1, foo.a};

But if any one of the field evaluations can move the object, then Clang would need to reload the forwarding pointer before storing each field. And that would need to apply recursively to any subfields. Such a scheme is definitely possible, but Clang doesn't implement it.

And if instead of a C struct you have a C++ class, supporting relocation to the heap becomes flat-out impossible. Consider:

struct Foo {
    void (^block_)(void);
    Foo(void (^block)(void)) {
        printf("I am at %p\n", this);
        block_ = Block_copy(block);
        printf("I am now at %p\n", this); // shouldn't change!
    }
};

__block Foo foo(^{
    printf("%p\n", &amp;foo);
});

C++ constructors can observe the address of the object being constructed, and expect that address not to change. Once the object has finished being constructed, it can be relocated by calling the move or copy constructor, and that's what Clang uses for block captures in the non-self-capturing case. But in this case, Block_copy is called while the stack object is still running its constructor. It would be nonsensical to try to move out of it in this state.

Clang's current behavior

So what does Clang currently do? Well, for both C structs and C++ classes (as well as unions2), it runs into this TODO dating to 2011:

case TEK_Aggregate:
  [...]
    // TODO: how can we delay here if D is captured by its initializer?

Clang simply neglects to call drillIntoBlockVariable and falls into its normal initialization code. That means the lvalue is initialized as if it pointed to the object, when it actually points to the byref header. As such, the generated code is non-functional. And this is true for all __block variables of record type that Clang detects as self-capturing, even if the initializer never actually calls Block_copy.

Solution

As stated, it's impossible for self-capturing variables of at least some record types to support moving to the heap. But there is a way to make them work regardless. Any given __block variable can move at most once, from the stack to the heap. Once on the heap, it never needs to move again. So, as suggested by @rjmccall in 2020, we can create a heap allocation before initialization even starts, and then initialize the object directly on the heap, knowing its address will remain stable.

This patch applies this approach to all self-capturing __block variables with record type, i.e. all variables that previously triggered the broken code generation.

The obvious downside of the approach that we pay the cost of a heap allocation even if Block_copy is never called. Conveniently, this can never be a performance regression per se, since the affected cases didn't work at all before. Still, there is room for future improvement, such as by making mid-initialization relocation work for C structs3, or detecting initializers that do contain a block literal but can't possibly call Block_copy on it.

To assist performance-sensitive codebases, this patch adds a warning, -Wimplicit-block-var-alloc, that triggers whenever a block requires an implicit allocation. Since most codebases probably don't care, this warning is disabled by default and is not included in any standard warning groups.

Allocation failure

Another downside of the approach is that because the heap allocation is completely implicit, there is no way to check for allocation failure. Normally, heap allocation happens only when you call Block_copy, which means you can catch allocation failure by checking whether Block_copy returns NULL.

...Or so I thought. It turns out that checking Block_copy's return value is only a half-measure. Block_copy does return NULL if there's an allocation failure when moving the block itself to the heap. But then Block_copy calls into the copy helper to move captured variables to the heap, which can also fail. If it does, the copy helper has no way to report that failure. Instead you just get a crash (which should probably be a proper abort; Apple folks, see rdar://126632757). This limitation is more or less baked into the ABI, since copy helpers return void. Even xnu, which uses blocks in kernel space, does not support allocation failure for them - though, luckily, xnu's version waits for memory to be available rather than crashing.

The point is, if recovery from allocation failure is already not a thing, then this patch isn't responsible for breaking it. Even if blocks are changed someday to support it, it would probably be desired only by a tiny minority of codebases, and those codebases could avoid implicit allocations by enabling the new warning and setting it as an error.

ABI limitation

The blocks runtime has no API to allocate a byref struct directly on the heap, because it's never been necessary before. There is only _Block_object_assign to move one from the stack to the heap. It would be nice to add an API to allocate directly on the heap, but for the dual sakes of backward compatibility and laziness, I have not attempted to do so.

Instead, the codegen works with the existing runtime, as follows. First, a byref struct is allocated on the stack (including space for the object), but only the header is initialized; the object part is left uninitialized. Then _Block_object_assign is called to migrate it to the heap (which unfortunately pays the cost of memcpying the uninitialized stack memory to the heap). This produces the desired heap allocation, but also increases the reference count, which is balanced by immediately calling _Block_object_dispose. After that, the object can finally be initialized inside the new heap allocation.

Improved self-capture analysis

The existing check for self-capturing __block variables is implemented in the function isCapturedBy. But it turns out that that implementation is overcomplicated for no good reason. In ancient days (2011), isCapturedBy only worked on Exprs, and it recursed on the Expr's children, which it assumed would also be Exprs. This assumption turned out to be false for GNU statement expressions. The simple fix would have been to change isCapturedBy to accept and recurse on arbitrary Stmts instead, since children() is defined for all Stmts, not just Exprs. I'm not sure if there was some reason that wasn't possible at the time, but what happened instead is that a special case was added for statement expressions, which over a series of patches got more and more convoluted.

This patch replaces the whole thing with an EvaluatedExprVisitor (another suggestion from @rjmccall). This is not only simpler, but also more precise, eliminating false positive self-captures including:

  • Block literals in unevaluated contexts (sizeof(^{ ... })).

  • Some types of statements inside GNU statement expressions.

This patch also moves the self-capture check earlier in the compilation process, from CodeGen to Sema. This is necessary in order to support the aforementioned warning.

Constant initialization

This patch also fixes a related bug where self-capturing __block variables could trigger this assert:

assert(!capturedByInit &amp;&amp; "constant init contains a capturing block?");

The assert is important, because constant initialization doesn't support the drillIntoBlockVariable mechanism. Now, a constant initializer obviously can't include a call to Block_copy; in fact, it can't include block literals at all unless they have no captures. But I found two cases where the assert can be triggered anyway.

First, the initializer can syntactically contain a capturing block literal but never evaluate it:

__block constexpr int foo = true ? 1 : ^{ return foo; }();

This is addressed in this patch by forcing constant-initialized variables to be treated as non-self-capturing.

Second, the initializer can simply not be constant. With variables of types like const int, Clang first tries to constant-evaluate the initializer, then falls back to runtime initialization if that fails. But the assert is located in a place where it triggers just based on the try. This is addressed in this patch by moving the assert down and conditioning it on constant evaluation succeeding.


Patch is 33.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/89475.diff

12 Files Affected:

  • (modified) clang/include/clang/AST/Decl.h (+20-2)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+6)
  • (modified) clang/lib/AST/Decl.cpp (+13)
  • (modified) clang/lib/CodeGen/CGBlocks.cpp (+57-24)
  • (modified) clang/lib/CodeGen/CGBlocks.h (+1)
  • (modified) clang/lib/CodeGen/CGDecl.cpp (+32-82)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+11-1)
  • (modified) clang/lib/CodeGen/CodeGenModule.h (+6-2)
  • (modified) clang/lib/Sema/Sema.cpp (+65)
  • (modified) clang/lib/Serialization/ASTReaderDecl.cpp (+1)
  • (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+4-2)
  • (modified) clang/test/CodeGenCXX/block-capture.cpp (+124-3)
diff --git a/clang/include/clang/AST/Decl.h b/clang/include/clang/AST/Decl.h
index 8b121896d66d15..dc816786d7f11d 100644
--- a/clang/include/clang/AST/Decl.h
+++ b/clang/include/clang/AST/Decl.h
@@ -1103,6 +1103,11 @@ class VarDecl : public DeclaratorDecl, public Redeclarable<VarDecl> {
 
     LLVM_PREFERRED_TYPE(bool)
     unsigned IsCXXCondDecl : 1;
+
+    /// Whether this is a __block variable that is captured by a block
+    /// referenced in its own initializer.
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned CapturedByOwnInit : 1;
   };
 
   union {
@@ -1588,10 +1593,23 @@ class VarDecl : public DeclaratorDecl, public Redeclarable<VarDecl> {
   /// escaping block.
   bool isNonEscapingByref() const;
 
-  void setEscapingByref() {
-    NonParmVarDeclBits.EscapingByref = true;
+  void setEscapingByref();
+
+  /// Indicates this is a __block variable that is captured by a block
+  /// referenced in its own initializer.
+  bool isCapturedByOwnInit() const {
+    return !isa<ParmVarDecl>(this) && NonParmVarDeclBits.CapturedByOwnInit;
   }
 
+  void setCapturedByOwnInit() {
+    assert(!isa<ParmVarDecl>(this));
+    NonParmVarDeclBits.CapturedByOwnInit = true;
+  }
+
+  /// Check if this is a __block variable which needs to be initialized
+  /// directly on the heap.
+  bool needsInitOnHeap() const;
+
   bool isCXXCondDecl() const {
     return isa<ParmVarDecl>(this) ? false : NonParmVarDeclBits.IsCXXCondDecl;
   }
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 1bbe76ff6bd2ac..f19ee8fbc21f3a 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -2356,6 +2356,12 @@ def note_var_fixit_add_initialization : Note<
 def note_uninit_fixit_remove_cond : Note<
   "remove the %select{'%1' if its condition|condition if it}0 "
   "is always %select{false|true}2">;
+def warn_implicit_block_var_alloc : Warning<
+  "variable %q0 will be initialized in a heap allocation">,
+  InGroup<DiagGroup<"implicit-block-var-alloc">>, DefaultIgnore;
+def note_because_captured_by_block : Note<
+  "because it is captured by a block used in its own initializer">;
+
 def err_init_incomplete_type : Error<"initialization of incomplete type %0">;
 def err_list_init_in_parens : Error<
   "cannot initialize %select{non-class|reference}0 type %1 with a "
diff --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index 474e0ccde5bbf7..fda9010ba126d7 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -2680,6 +2680,19 @@ bool VarDecl::isNonEscapingByref() const {
   return hasAttr<BlocksAttr>() && !NonParmVarDeclBits.EscapingByref;
 }
 
+void VarDecl::setEscapingByref() {
+  assert(!isa<ParmVarDecl>(this) && hasAttr<BlocksAttr>());
+  NonParmVarDeclBits.EscapingByref = true;
+}
+
+bool VarDecl::needsInitOnHeap() const {
+  // We need direct initialization on the heap for self-capturing __block
+  // variables of types that cause CodeGenFunction::getEvaluationKind to return
+  // TEK_Aggregate.  The only such types that can be captured are records.
+  return isCapturedByOwnInit() &&
+         getType().getAtomicUnqualifiedType()->isRecordType();
+}
+
 bool VarDecl::hasDependentAlignment() const {
   QualType T = getType();
   return T->isDependentType() || T->isUndeducedType() ||
diff --git a/clang/lib/CodeGen/CGBlocks.cpp b/clang/lib/CodeGen/CGBlocks.cpp
index 2742c39965b2c8..4bea057a15a52c 100644
--- a/clang/lib/CodeGen/CGBlocks.cpp
+++ b/clang/lib/CodeGen/CGBlocks.cpp
@@ -45,6 +45,13 @@ CGBlockInfo::CGBlockInfo(const BlockDecl *block, StringRef name)
     name = name.substr(1);
 }
 
+BlockByrefHelpers::BlockByrefHelpers(const BlockByrefInfo &info)
+    : CopyHelper(nullptr), DisposeHelper(nullptr),
+      // The alignment we care about for the purposes of uniquing byref
+      // helpers is the alignment of the actual byref value field.
+      Alignment(info.ByrefAlignment.alignmentAtOffset(info.FieldOffset)),
+      ForceNoopCopy(info.ForceNoopCopy) {}
+
 // Anchor the vtable to this translation unit.
 BlockByrefHelpers::~BlockByrefHelpers() {}
 
@@ -2127,8 +2134,8 @@ class ObjectByrefHelpers final : public BlockByrefHelpers {
   BlockFieldFlags Flags;
 
 public:
-  ObjectByrefHelpers(CharUnits alignment, BlockFieldFlags flags)
-    : BlockByrefHelpers(alignment), Flags(flags) {}
+  ObjectByrefHelpers(const BlockByrefInfo &info, BlockFieldFlags flags)
+      : BlockByrefHelpers(info), Flags(flags) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2161,7 +2168,7 @@ class ObjectByrefHelpers final : public BlockByrefHelpers {
 /// Emits the copy/dispose helpers for an ARC __block __weak variable.
 class ARCWeakByrefHelpers final : public BlockByrefHelpers {
 public:
-  ARCWeakByrefHelpers(CharUnits alignment) : BlockByrefHelpers(alignment) {}
+  ARCWeakByrefHelpers(const BlockByrefInfo &info) : BlockByrefHelpers(info) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2182,7 +2189,7 @@ class ARCWeakByrefHelpers final : public BlockByrefHelpers {
 /// that's not of block-pointer type.
 class ARCStrongByrefHelpers final : public BlockByrefHelpers {
 public:
-  ARCStrongByrefHelpers(CharUnits alignment) : BlockByrefHelpers(alignment) {}
+  ARCStrongByrefHelpers(const BlockByrefInfo &info) : BlockByrefHelpers(info) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2218,8 +2225,8 @@ class ARCStrongByrefHelpers final : public BlockByrefHelpers {
 /// variable that's of block-pointer type.
 class ARCStrongBlockByrefHelpers final : public BlockByrefHelpers {
 public:
-  ARCStrongBlockByrefHelpers(CharUnits alignment)
-    : BlockByrefHelpers(alignment) {}
+  ARCStrongBlockByrefHelpers(const BlockByrefInfo &info)
+      : BlockByrefHelpers(info) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2248,9 +2255,9 @@ class CXXByrefHelpers final : public BlockByrefHelpers {
   const Expr *CopyExpr;
 
 public:
-  CXXByrefHelpers(CharUnits alignment, QualType type,
+  CXXByrefHelpers(const BlockByrefInfo &info, QualType type,
                   const Expr *copyExpr)
-    : BlockByrefHelpers(alignment), VarType(type), CopyExpr(copyExpr) {}
+    : BlockByrefHelpers(info), VarType(type), CopyExpr(copyExpr) {}
 
   bool needsCopy() const override { return CopyExpr != nullptr; }
   void emitCopy(CodeGenFunction &CGF, Address destField,
@@ -2276,8 +2283,8 @@ class NonTrivialCStructByrefHelpers final : public BlockByrefHelpers {
   QualType VarType;
 
 public:
-  NonTrivialCStructByrefHelpers(CharUnits alignment, QualType type)
-    : BlockByrefHelpers(alignment), VarType(type) {}
+  NonTrivialCStructByrefHelpers(const BlockByrefInfo &info, QualType type)
+      : BlockByrefHelpers(info), VarType(type) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2336,7 +2343,7 @@ generateByrefCopyHelper(CodeGenFunction &CGF, const BlockByrefInfo &byrefInfo,
     // Create a scope with an artificial location for the body of this function.
   auto AL = ApplyDebugLocation::CreateArtificial(CGF);
 
-  if (generator.needsCopy()) {
+  if (generator.needsCopy() && !generator.ForceNoopCopy) {
     // dst->x
     Address destField = CGF.GetAddrOfLocalVar(&Dst);
     destField = Address(CGF.Builder.CreateLoad(destField), byrefInfo.Type,
@@ -2458,18 +2465,13 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
 
   auto &byrefInfo = getBlockByrefInfo(&var);
 
-  // The alignment we care about for the purposes of uniquing byref
-  // helpers is the alignment of the actual byref value field.
-  CharUnits valueAlignment =
-    byrefInfo.ByrefAlignment.alignmentAtOffset(byrefInfo.FieldOffset);
-
   if (const CXXRecordDecl *record = type->getAsCXXRecordDecl()) {
     const Expr *copyExpr =
         CGM.getContext().getBlockVarCopyInit(&var).getCopyExpr();
     if (!copyExpr && record->hasTrivialDestructor()) return nullptr;
 
     return ::buildByrefHelpers(
-        CGM, byrefInfo, CXXByrefHelpers(valueAlignment, type, copyExpr));
+        CGM, byrefInfo, CXXByrefHelpers(byrefInfo, type, copyExpr));
   }
 
   // If type is a non-trivial C struct type that is non-trivial to
@@ -2477,7 +2479,7 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
   if (type.isNonTrivialToPrimitiveDestructiveMove() == QualType::PCK_Struct ||
       type.isDestructedType() == QualType::DK_nontrivial_c_struct)
     return ::buildByrefHelpers(
-        CGM, byrefInfo, NonTrivialCStructByrefHelpers(valueAlignment, type));
+        CGM, byrefInfo, NonTrivialCStructByrefHelpers(byrefInfo, type));
 
   // Otherwise, if we don't have a retainable type, there's nothing to do.
   // that the runtime does extra copies.
@@ -2499,7 +2501,7 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
     // byref routines.
     case Qualifiers::OCL_Weak:
       return ::buildByrefHelpers(CGM, byrefInfo,
-                                 ARCWeakByrefHelpers(valueAlignment));
+                                 ARCWeakByrefHelpers(byrefInfo));
 
     // ARC __strong __block variables need to be retained.
     case Qualifiers::OCL_Strong:
@@ -2507,13 +2509,13 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
       // transfer possible.
       if (type->isBlockPointerType()) {
         return ::buildByrefHelpers(CGM, byrefInfo,
-                                   ARCStrongBlockByrefHelpers(valueAlignment));
+                                   ARCStrongBlockByrefHelpers(byrefInfo));
 
       // Otherwise, we transfer ownership of the retain from the stack
       // to the heap.
       } else {
         return ::buildByrefHelpers(CGM, byrefInfo,
-                                   ARCStrongByrefHelpers(valueAlignment));
+                                   ARCStrongByrefHelpers(byrefInfo));
       }
     }
     llvm_unreachable("fell out of lifetime switch!");
@@ -2533,7 +2535,7 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
     flags |= BLOCK_FIELD_IS_WEAK;
 
   return ::buildByrefHelpers(CGM, byrefInfo,
-                             ObjectByrefHelpers(valueAlignment, flags));
+                             ObjectByrefHelpers(byrefInfo, flags));
 }
 
 Address CodeGenFunction::emitBlockByrefAddress(Address baseAddr,
@@ -2653,11 +2655,39 @@ const BlockByrefInfo &CodeGenFunction::getBlockByrefInfo(const VarDecl *D) {
   info.FieldOffset = varOffset;
   info.ByrefAlignment = std::max(varAlign, getPointerAlign());
 
+  // If we're initializing directly on the heap, then we should emit a
+  // no-op copy helper, both because we don't need a real one (the
+  // object will never move), and because a real one would break the
+  // pre-init _Block_object_assign.
+  info.ForceNoopCopy = D->needsInitOnHeap();
+
   auto pair = BlockByrefInfos.insert({D, info});
   assert(pair.second && "info was inserted recursively?");
   return pair.first->second;
 }
 
+void CodeGenFunction::emitByrefInitOnHeap(llvm::Value *P) {
+  // The object itself is initialized directly on the heap.  But for ABI
+  // backwards compatibility reasons, we have to set up a fake byref struct on
+  // the stack (with the structural components initialized but not the object
+  // itself), then call _Block_object_assign to move it to the heap (which is
+  // safe because we forced a no-op copy helper), then call
+  // _Block_object_dispose to release the extra ref from _Block_object_assign.
+  //
+  // 'P' points to the fake byref struct.
+
+  BlockFieldFlags flags = BLOCK_FIELD_IS_BYREF;
+  // Ignored out-parameter.  We'll use the forwarding pointer instead.
+  RawAddress out = CreateDefaultAlignTempAlloca(P->getType(), "initOnHeap.out");
+
+  llvm::Value *args[] = {Builder.CreateBitCast(out.getPointer(), VoidPtrTy),
+                         Builder.CreateBitCast(P, VoidPtrTy),
+                         llvm::ConstantInt::get(Int32Ty, flags.getBitMask())};
+  EmitNounwindRuntimeCall(CGM.getBlockObjectAssign(), args);
+
+  BuildBlockRelease(P, flags, false);
+}
+
 /// Initialize the structural components of a __block variable, i.e.
 /// everything but the actual object.
 void CodeGenFunction::emitByrefStructureInit(const AutoVarEmission &emission) {
@@ -2699,8 +2729,8 @@ void CodeGenFunction::emitByrefStructureInit(const AutoVarEmission &emission) {
   storeHeaderField(V, getPointerSize(), "byref.isa");
 
   // Store the address of the variable into its own forwarding pointer.
-  storeHeaderField(addr.emitRawPointer(*this), getPointerSize(),
-                   "byref.forwarding");
+  llvm::Value *pointer = addr.emitRawPointer(*this);
+  storeHeaderField(pointer, getPointerSize(), "byref.forwarding");
 
   // Blocks ABI:
   //   c) the flags field is set to either 0 if no helper functions are
@@ -2764,6 +2794,9 @@ void CodeGenFunction::emitByrefStructureInit(const AutoVarEmission &emission) {
     auto layoutInfo = CGM.getObjCRuntime().BuildByrefLayout(CGM, type);
     storeHeaderField(layoutInfo, getPointerSize(), "byref.layout");
   }
+
+  if (emission.NeedsInitOnHeap)
+    emitByrefInitOnHeap(pointer);
 }
 
 void CodeGenFunction::BuildBlockRelease(llvm::Value *V, BlockFieldFlags flags,
diff --git a/clang/lib/CodeGen/CGBlocks.h b/clang/lib/CodeGen/CGBlocks.h
index 8d10c4f69b2026..526dbbfd9b38d3 100644
--- a/clang/lib/CodeGen/CGBlocks.h
+++ b/clang/lib/CodeGen/CGBlocks.h
@@ -139,6 +139,7 @@ class BlockByrefInfo {
   unsigned FieldIndex;
   CharUnits ByrefAlignment;
   CharUnits FieldOffset;
+  bool ForceNoopCopy;
 };
 
 /// Represents a type of copy/destroy operation that should be performed for an
diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp
index ce6d6d8956076e..954dd1079215fd 100644
--- a/clang/lib/CodeGen/CGDecl.cpp
+++ b/clang/lib/CodeGen/CGDecl.cpp
@@ -1450,6 +1450,8 @@ CodeGenFunction::EmitAutoVarAlloca(const VarDecl &D) {
 
   bool isEscapingByRef = D.isEscapingByref();
   emission.IsEscapingByRef = isEscapingByRef;
+  emission.IsCapturedByOwnInit = D.isCapturedByOwnInit();
+  emission.NeedsInitOnHeap = D.needsInitOnHeap();
 
   CharUnits alignment = getContext().getDeclAlign(&D);
 
@@ -1692,68 +1694,6 @@ CodeGenFunction::EmitAutoVarAlloca(const VarDecl &D) {
   return emission;
 }
 
-static bool isCapturedBy(const VarDecl &, const Expr *);
-
-/// Determines whether the given __block variable is potentially
-/// captured by the given statement.
-static bool isCapturedBy(const VarDecl &Var, const Stmt *S) {
-  if (const Expr *E = dyn_cast<Expr>(S))
-    return isCapturedBy(Var, E);
-  for (const Stmt *SubStmt : S->children())
-    if (isCapturedBy(Var, SubStmt))
-      return true;
-  return false;
-}
-
-/// Determines whether the given __block variable is potentially
-/// captured by the given expression.
-static bool isCapturedBy(const VarDecl &Var, const Expr *E) {
-  // Skip the most common kinds of expressions that make
-  // hierarchy-walking expensive.
-  E = E->IgnoreParenCasts();
-
-  if (const BlockExpr *BE = dyn_cast<BlockExpr>(E)) {
-    const BlockDecl *Block = BE->getBlockDecl();
-    for (const auto &I : Block->captures()) {
-      if (I.getVariable() == &Var)
-        return true;
-    }
-
-    // No need to walk into the subexpressions.
-    return false;
-  }
-
-  if (const StmtExpr *SE = dyn_cast<StmtExpr>(E)) {
-    const CompoundStmt *CS = SE->getSubStmt();
-    for (const auto *BI : CS->body())
-      if (const auto *BIE = dyn_cast<Expr>(BI)) {
-        if (isCapturedBy(Var, BIE))
-          return true;
-      }
-      else if (const auto *DS = dyn_cast<DeclStmt>(BI)) {
-          // special case declarations
-          for (const auto *I : DS->decls()) {
-              if (const auto *VD = dyn_cast<VarDecl>((I))) {
-                const Expr *Init = VD->getInit();
-                if (Init && isCapturedBy(Var, Init))
-                  return true;
-              }
-          }
-      }
-      else
-        // FIXME. Make safe assumption assuming arbitrary statements cause capturing.
-        // Later, provide code to poke into statements for capture analysis.
-        return true;
-    return false;
-  }
-
-  for (const Stmt *SubStmt : E->children())
-    if (isCapturedBy(Var, SubStmt))
-      return true;
-
-  return false;
-}
-
 /// Determine whether the given initializer is trivial in the sense
 /// that it requires no code to be generated.
 bool CodeGenFunction::isTrivialInitializer(const Expr *Init) {
@@ -1918,8 +1858,7 @@ void CodeGenFunction::EmitAutoVarInit(const AutoVarEmission &emission) {
   // Check whether this is a byref variable that's potentially
   // captured and moved by its own initializer.  If so, we'll need to
   // emit the initializer first, then copy into the variable.
-  bool capturedByInit =
-      Init && emission.IsEscapingByRef && isCapturedBy(D, Init);
+  bool capturedByInit = emission.IsCapturedByOwnInit;
 
   bool locIsByrefHeader = !capturedByInit;
   const Address Loc =
@@ -1951,8 +1890,9 @@ void CodeGenFunction::EmitAutoVarInit(const AutoVarEmission &emission) {
   llvm::Constant *constant = nullptr;
   if (emission.IsConstantAggregate ||
       D.mightBeUsableInConstantExpressions(getContext())) {
-    assert(!capturedByInit && "constant init contains a capturing block?");
     constant = ConstantEmitter(*this).tryEmitAbstractForInitializer(D);
+    assert(!(constant && capturedByInit) &&
+           "constant init contains a capturing block?");
     if (constant && !constant->isZeroValue() &&
         (trivialAutoVarInit !=
          LangOptions::TrivialAutoVarInitKind::Uninitialized)) {
@@ -1975,6 +1915,10 @@ void CodeGenFunction::EmitAutoVarInit(const AutoVarEmission &emission) {
     initializeWhatIsTechnicallyUninitialized(Loc);
     LValue lv = MakeAddrLValue(Loc, type);
     lv.setNonGC(true);
+    if (emission.NeedsInitOnHeap) {
+      drillIntoBlockVariable(*this, lv, &D);
+      capturedByInit = false; // We don't need to drill again.
+    }
     return EmitExprAsInit(Init, &D, lv, capturedByInit);
   }
 
@@ -2023,6 +1967,9 @@ void CodeGenFunction::EmitExprAsInit(const Expr *init, const ValueDecl *D,
     return;
   }
   case TEK_Aggregate:
+    assert(
+        !capturedByInit &&
+        "Capture-by-initializer should have been handled by NeedsInitOnHeap!");
     if (type->isAtomicType()) {
       EmitAtomicInit(const_cast<Expr*>(init), lvalue);
     } else {
@@ -2031,7 +1978,6 @@ void CodeGenFunction::EmitExprAsInit(const Expr *init, const ValueDecl *D,
         Overlap = AggValueSlot::DoesNotOverlap;
       else if (auto *FD = dyn_cast<FieldDecl>(D))
         Overlap = getOverlapForFieldInit(FD);
-      // TODO: how can we delay here if D is captured by its initializer?
       EmitAggExpr(init, AggValueSlot::forLValue(
                             lvalue, *this, AggValueSlot::IsDestructed,
                             AggValueSlot::DoesNotNeedGCBarriers,
@@ -2120,27 +2066,31 @@ void CodeGenFunction::EmitAutoVarCleanups(const AutoVarEmission &emission) {
   // us from jumping into any of these scopes anyway.
   if (!HaveInsertPoint()) return;
 
-  const VarDecl &D = *emission.Variable;
+  // If we're initializing directly on the heap, _Block_object_destroy will
+  // handle destruction, so we don't need to perform cleanups directly.
+  if (!emission.NeedsInitOnHeap) {
+    const VarDecl &D = *emission.Variable;
 
-  // Check the type for a cleanup.
-  if (QualType::DestructionKind dtorKind = D.needsDestruction(getContext()))
-    emitAutoVarTypeCleanup(emission, dtorKind);
+    // Check the type for a cleanup.
+    if (QualType::DestructionKind dtorKind = D.needsDestruction(getContext()))
+      emitAutoVarTypeCleanup(emission, dtorKind);
 
-  // In GC mode, honor objc_precise_lifetime.
-  if (getLangOpts().getGC() != LangOptions::NonGC &&
-      D.hasAttr<ObjCPreciseLifetimeAttr>()) {
-    EHStack.pushCleanup<ExtendGCLifetime>(NormalCleanup, &D);
-  }
+    // In GC mode, honor objc_precise_lifetime.
+    if (getLangOpts().getGC() != LangOptions::NonGC &&
+        D.hasAttr<ObjCPreciseLifetimeAttr>()) {
+      EHStack.pushCleanup<ExtendGCLifetime>(NormalCleanup, &D);
+    }
 
-  // Handle the cleanup attribute.
-  if (const CleanupAttr *CA = D.getAttr<CleanupAttr>()) {
-    const FunctionDecl *FD = CA->getFunctionDec...
[truncated]

Footnotes

  1. Supporting such code appears to be required by the C++ spec (see
    https://timsong-cpp.github.io/cppwp/n4861/dcl.init.aggr#6), though not
    by the C spec (see https://port70.net/~nsz/c/c11/n1570.html#6.7.9p23).

  2. There are other TEK_Aggregate types besides records, but all such
    types either can't be captured by blocks (arrays) or can't be the
    type of a local variable at all.

  3. Though even in C, moving mid-initializer could break code that took the address of the variable:

    struct Foo { void *a; void *(^b)(void); };
    __block struct Foo x = {&amp;x.b, Block_copy(^{ return x.a; })};
    assert(x.a == &amp;x.b); // fails without up-front heap allocation
    

    However, it's a bit less obvious that such code needs to work; perhaps the user should just be aware the variable will be moving. It's not like C++ where construction is clearly an indivisible operation performed at a single address.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 20, 2024

@llvm/pr-subscribers-clang-modules

Author: None (ille-apple)

Changes

This is an updated version of https://reviews.llvm.org/D90434 from 2020. Due to time constraints I stopped working on the patch, and unfortunately never got back around to it until now.

Clang special-cases __block variables whose initializer contains a block literal that captures the variable itself. But this special case doesn't work for variables of record types, causing Clang to emit broken code.

Fix this by initializing such variables directly on the heap, and add a warning for when this needs to be done.

Background on the special case

__block variables are unique in that their address can change during the lifetime of the variable. Each __block variable starts on the stack. But when a block is retained using Block_copy, both the block itself and any __block variables it captures are moved to the heap.

What's tricky is when this move happens while the variable is still being initialized. For example:

int copyBlockAndReturnInt(void (^block)(void)) {
    Block_copy(block);
    return 42;
}

int main() {
  __block int val = copyBlockAndReturnInt(^{
      printf("%d\n", val);
  });
}

Here, when Block_copy is called, val is moved to the heap while still in an uninitialized state. Then when copyBlockAndReturnInt returns, main needs to store the return value to the variable's new location, not to the original stack slot.

This can only happen if a block literal that captures the variable is located syntactically within the initializer itself. If it's before the initializer then it couldn't capture the variable, and if it's after the initializer then it couldn't execute during initialization.

So Clang detects when such a literal exists inside the initializer; CGDecl.cpp refers to this condition as capturedByInit. If it does, then the initialization code operates in a special mode. In several functions in that file, there's an LValue object and an accompanying capturedByInit flag. If capturedByInit is false (the typical case), then the lvalue points to the object being initialized. But if it's true, the lvalue points to the byref header. Then, depending on the type of the variable, you end up at one of many checks of the form:

if (capturedByInit)
  drillIntoBlockVariable(*this, lvalue, cast&lt;VarDecl&gt;(D));

drillIntoBlockVariable emits a load of the forwarding pointer from the byref header (to find the object's current location), and then modifies lvalue to point to the object, unifying the two cases. Importantly, this happens at the last minute: after emitting code to evaluate the initializer, but before emitting the store instruction to write the evaluated value into the variable.

Okay, but then why are there two different code paths? If the address calculation can be safely deferred, why not just do that all the time rather than having a separate mode just for self-capturing __block variables?

Because it can't always be safely deferred.

Harder cases

First, consider the case where variable being initialized is a C struct. This is already problematic. drillIntoBlockVariable is supposed to be called after evaluating the initializer but before doing a store. But when initializing a struct, Clang doesn't evaluate the whole initializer before storing. Instead it evaluates the first field, stores the first field, evaluates the second field, stores the second field, and so on. This is actually necessary for correctness1 given code like:

struct Foo { int a, b; };
struct Foo foo = {1, foo.a};

But if any one of the field evaluations can move the object, then Clang would need to reload the forwarding pointer before storing each field. And that would need to apply recursively to any subfields. Such a scheme is definitely possible, but Clang doesn't implement it.

And if instead of a C struct you have a C++ class, supporting relocation to the heap becomes flat-out impossible. Consider:

struct Foo {
    void (^block_)(void);
    Foo(void (^block)(void)) {
        printf("I am at %p\n", this);
        block_ = Block_copy(block);
        printf("I am now at %p\n", this); // shouldn't change!
    }
};

__block Foo foo(^{
    printf("%p\n", &amp;foo);
});

C++ constructors can observe the address of the object being constructed, and expect that address not to change. Once the object has finished being constructed, it can be relocated by calling the move or copy constructor, and that's what Clang uses for block captures in the non-self-capturing case. But in this case, Block_copy is called while the stack object is still running its constructor. It would be nonsensical to try to move out of it in this state.

Clang's current behavior

So what does Clang currently do? Well, for both C structs and C++ classes (as well as unions2), it runs into this TODO dating to 2011:

case TEK_Aggregate:
  [...]
    // TODO: how can we delay here if D is captured by its initializer?

Clang simply neglects to call drillIntoBlockVariable and falls into its normal initialization code. That means the lvalue is initialized as if it pointed to the object, when it actually points to the byref header. As such, the generated code is non-functional. And this is true for all __block variables of record type that Clang detects as self-capturing, even if the initializer never actually calls Block_copy.

Solution

As stated, it's impossible for self-capturing variables of at least some record types to support moving to the heap. But there is a way to make them work regardless. Any given __block variable can move at most once, from the stack to the heap. Once on the heap, it never needs to move again. So, as suggested by @rjmccall in 2020, we can create a heap allocation before initialization even starts, and then initialize the object directly on the heap, knowing its address will remain stable.

This patch applies this approach to all self-capturing __block variables with record type, i.e. all variables that previously triggered the broken code generation.

The obvious downside of the approach that we pay the cost of a heap allocation even if Block_copy is never called. Conveniently, this can never be a performance regression per se, since the affected cases didn't work at all before. Still, there is room for future improvement, such as by making mid-initialization relocation work for C structs3, or detecting initializers that do contain a block literal but can't possibly call Block_copy on it.

To assist performance-sensitive codebases, this patch adds a warning, -Wimplicit-block-var-alloc, that triggers whenever a block requires an implicit allocation. Since most codebases probably don't care, this warning is disabled by default and is not included in any standard warning groups.

Allocation failure

Another downside of the approach is that because the heap allocation is completely implicit, there is no way to check for allocation failure. Normally, heap allocation happens only when you call Block_copy, which means you can catch allocation failure by checking whether Block_copy returns NULL.

...Or so I thought. It turns out that checking Block_copy's return value is only a half-measure. Block_copy does return NULL if there's an allocation failure when moving the block itself to the heap. But then Block_copy calls into the copy helper to move captured variables to the heap, which can also fail. If it does, the copy helper has no way to report that failure. Instead you just get a crash (which should probably be a proper abort; Apple folks, see rdar://126632757). This limitation is more or less baked into the ABI, since copy helpers return void. Even xnu, which uses blocks in kernel space, does not support allocation failure for them - though, luckily, xnu's version waits for memory to be available rather than crashing.

The point is, if recovery from allocation failure is already not a thing, then this patch isn't responsible for breaking it. Even if blocks are changed someday to support it, it would probably be desired only by a tiny minority of codebases, and those codebases could avoid implicit allocations by enabling the new warning and setting it as an error.

ABI limitation

The blocks runtime has no API to allocate a byref struct directly on the heap, because it's never been necessary before. There is only _Block_object_assign to move one from the stack to the heap. It would be nice to add an API to allocate directly on the heap, but for the dual sakes of backward compatibility and laziness, I have not attempted to do so.

Instead, the codegen works with the existing runtime, as follows. First, a byref struct is allocated on the stack (including space for the object), but only the header is initialized; the object part is left uninitialized. Then _Block_object_assign is called to migrate it to the heap (which unfortunately pays the cost of memcpying the uninitialized stack memory to the heap). This produces the desired heap allocation, but also increases the reference count, which is balanced by immediately calling _Block_object_dispose. After that, the object can finally be initialized inside the new heap allocation.

Improved self-capture analysis

The existing check for self-capturing __block variables is implemented in the function isCapturedBy. But it turns out that that implementation is overcomplicated for no good reason. In ancient days (2011), isCapturedBy only worked on Exprs, and it recursed on the Expr's children, which it assumed would also be Exprs. This assumption turned out to be false for GNU statement expressions. The simple fix would have been to change isCapturedBy to accept and recurse on arbitrary Stmts instead, since children() is defined for all Stmts, not just Exprs. I'm not sure if there was some reason that wasn't possible at the time, but what happened instead is that a special case was added for statement expressions, which over a series of patches got more and more convoluted.

This patch replaces the whole thing with an EvaluatedExprVisitor (another suggestion from @rjmccall). This is not only simpler, but also more precise, eliminating false positive self-captures including:

  • Block literals in unevaluated contexts (sizeof(^{ ... })).

  • Some types of statements inside GNU statement expressions.

This patch also moves the self-capture check earlier in the compilation process, from CodeGen to Sema. This is necessary in order to support the aforementioned warning.

Constant initialization

This patch also fixes a related bug where self-capturing __block variables could trigger this assert:

assert(!capturedByInit &amp;&amp; "constant init contains a capturing block?");

The assert is important, because constant initialization doesn't support the drillIntoBlockVariable mechanism. Now, a constant initializer obviously can't include a call to Block_copy; in fact, it can't include block literals at all unless they have no captures. But I found two cases where the assert can be triggered anyway.

First, the initializer can syntactically contain a capturing block literal but never evaluate it:

__block constexpr int foo = true ? 1 : ^{ return foo; }();

This is addressed in this patch by forcing constant-initialized variables to be treated as non-self-capturing.

Second, the initializer can simply not be constant. With variables of types like const int, Clang first tries to constant-evaluate the initializer, then falls back to runtime initialization if that fails. But the assert is located in a place where it triggers just based on the try. This is addressed in this patch by moving the assert down and conditioning it on constant evaluation succeeding.


Patch is 33.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/89475.diff

12 Files Affected:

  • (modified) clang/include/clang/AST/Decl.h (+20-2)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+6)
  • (modified) clang/lib/AST/Decl.cpp (+13)
  • (modified) clang/lib/CodeGen/CGBlocks.cpp (+57-24)
  • (modified) clang/lib/CodeGen/CGBlocks.h (+1)
  • (modified) clang/lib/CodeGen/CGDecl.cpp (+32-82)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+11-1)
  • (modified) clang/lib/CodeGen/CodeGenModule.h (+6-2)
  • (modified) clang/lib/Sema/Sema.cpp (+65)
  • (modified) clang/lib/Serialization/ASTReaderDecl.cpp (+1)
  • (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+4-2)
  • (modified) clang/test/CodeGenCXX/block-capture.cpp (+124-3)
diff --git a/clang/include/clang/AST/Decl.h b/clang/include/clang/AST/Decl.h
index 8b121896d66d15..dc816786d7f11d 100644
--- a/clang/include/clang/AST/Decl.h
+++ b/clang/include/clang/AST/Decl.h
@@ -1103,6 +1103,11 @@ class VarDecl : public DeclaratorDecl, public Redeclarable<VarDecl> {
 
     LLVM_PREFERRED_TYPE(bool)
     unsigned IsCXXCondDecl : 1;
+
+    /// Whether this is a __block variable that is captured by a block
+    /// referenced in its own initializer.
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned CapturedByOwnInit : 1;
   };
 
   union {
@@ -1588,10 +1593,23 @@ class VarDecl : public DeclaratorDecl, public Redeclarable<VarDecl> {
   /// escaping block.
   bool isNonEscapingByref() const;
 
-  void setEscapingByref() {
-    NonParmVarDeclBits.EscapingByref = true;
+  void setEscapingByref();
+
+  /// Indicates this is a __block variable that is captured by a block
+  /// referenced in its own initializer.
+  bool isCapturedByOwnInit() const {
+    return !isa<ParmVarDecl>(this) && NonParmVarDeclBits.CapturedByOwnInit;
   }
 
+  void setCapturedByOwnInit() {
+    assert(!isa<ParmVarDecl>(this));
+    NonParmVarDeclBits.CapturedByOwnInit = true;
+  }
+
+  /// Check if this is a __block variable which needs to be initialized
+  /// directly on the heap.
+  bool needsInitOnHeap() const;
+
   bool isCXXCondDecl() const {
     return isa<ParmVarDecl>(this) ? false : NonParmVarDeclBits.IsCXXCondDecl;
   }
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 1bbe76ff6bd2ac..f19ee8fbc21f3a 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -2356,6 +2356,12 @@ def note_var_fixit_add_initialization : Note<
 def note_uninit_fixit_remove_cond : Note<
   "remove the %select{'%1' if its condition|condition if it}0 "
   "is always %select{false|true}2">;
+def warn_implicit_block_var_alloc : Warning<
+  "variable %q0 will be initialized in a heap allocation">,
+  InGroup<DiagGroup<"implicit-block-var-alloc">>, DefaultIgnore;
+def note_because_captured_by_block : Note<
+  "because it is captured by a block used in its own initializer">;
+
 def err_init_incomplete_type : Error<"initialization of incomplete type %0">;
 def err_list_init_in_parens : Error<
   "cannot initialize %select{non-class|reference}0 type %1 with a "
diff --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index 474e0ccde5bbf7..fda9010ba126d7 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -2680,6 +2680,19 @@ bool VarDecl::isNonEscapingByref() const {
   return hasAttr<BlocksAttr>() && !NonParmVarDeclBits.EscapingByref;
 }
 
+void VarDecl::setEscapingByref() {
+  assert(!isa<ParmVarDecl>(this) && hasAttr<BlocksAttr>());
+  NonParmVarDeclBits.EscapingByref = true;
+}
+
+bool VarDecl::needsInitOnHeap() const {
+  // We need direct initialization on the heap for self-capturing __block
+  // variables of types that cause CodeGenFunction::getEvaluationKind to return
+  // TEK_Aggregate.  The only such types that can be captured are records.
+  return isCapturedByOwnInit() &&
+         getType().getAtomicUnqualifiedType()->isRecordType();
+}
+
 bool VarDecl::hasDependentAlignment() const {
   QualType T = getType();
   return T->isDependentType() || T->isUndeducedType() ||
diff --git a/clang/lib/CodeGen/CGBlocks.cpp b/clang/lib/CodeGen/CGBlocks.cpp
index 2742c39965b2c8..4bea057a15a52c 100644
--- a/clang/lib/CodeGen/CGBlocks.cpp
+++ b/clang/lib/CodeGen/CGBlocks.cpp
@@ -45,6 +45,13 @@ CGBlockInfo::CGBlockInfo(const BlockDecl *block, StringRef name)
     name = name.substr(1);
 }
 
+BlockByrefHelpers::BlockByrefHelpers(const BlockByrefInfo &info)
+    : CopyHelper(nullptr), DisposeHelper(nullptr),
+      // The alignment we care about for the purposes of uniquing byref
+      // helpers is the alignment of the actual byref value field.
+      Alignment(info.ByrefAlignment.alignmentAtOffset(info.FieldOffset)),
+      ForceNoopCopy(info.ForceNoopCopy) {}
+
 // Anchor the vtable to this translation unit.
 BlockByrefHelpers::~BlockByrefHelpers() {}
 
@@ -2127,8 +2134,8 @@ class ObjectByrefHelpers final : public BlockByrefHelpers {
   BlockFieldFlags Flags;
 
 public:
-  ObjectByrefHelpers(CharUnits alignment, BlockFieldFlags flags)
-    : BlockByrefHelpers(alignment), Flags(flags) {}
+  ObjectByrefHelpers(const BlockByrefInfo &info, BlockFieldFlags flags)
+      : BlockByrefHelpers(info), Flags(flags) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2161,7 +2168,7 @@ class ObjectByrefHelpers final : public BlockByrefHelpers {
 /// Emits the copy/dispose helpers for an ARC __block __weak variable.
 class ARCWeakByrefHelpers final : public BlockByrefHelpers {
 public:
-  ARCWeakByrefHelpers(CharUnits alignment) : BlockByrefHelpers(alignment) {}
+  ARCWeakByrefHelpers(const BlockByrefInfo &info) : BlockByrefHelpers(info) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2182,7 +2189,7 @@ class ARCWeakByrefHelpers final : public BlockByrefHelpers {
 /// that's not of block-pointer type.
 class ARCStrongByrefHelpers final : public BlockByrefHelpers {
 public:
-  ARCStrongByrefHelpers(CharUnits alignment) : BlockByrefHelpers(alignment) {}
+  ARCStrongByrefHelpers(const BlockByrefInfo &info) : BlockByrefHelpers(info) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2218,8 +2225,8 @@ class ARCStrongByrefHelpers final : public BlockByrefHelpers {
 /// variable that's of block-pointer type.
 class ARCStrongBlockByrefHelpers final : public BlockByrefHelpers {
 public:
-  ARCStrongBlockByrefHelpers(CharUnits alignment)
-    : BlockByrefHelpers(alignment) {}
+  ARCStrongBlockByrefHelpers(const BlockByrefInfo &info)
+      : BlockByrefHelpers(info) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2248,9 +2255,9 @@ class CXXByrefHelpers final : public BlockByrefHelpers {
   const Expr *CopyExpr;
 
 public:
-  CXXByrefHelpers(CharUnits alignment, QualType type,
+  CXXByrefHelpers(const BlockByrefInfo &info, QualType type,
                   const Expr *copyExpr)
-    : BlockByrefHelpers(alignment), VarType(type), CopyExpr(copyExpr) {}
+    : BlockByrefHelpers(info), VarType(type), CopyExpr(copyExpr) {}
 
   bool needsCopy() const override { return CopyExpr != nullptr; }
   void emitCopy(CodeGenFunction &CGF, Address destField,
@@ -2276,8 +2283,8 @@ class NonTrivialCStructByrefHelpers final : public BlockByrefHelpers {
   QualType VarType;
 
 public:
-  NonTrivialCStructByrefHelpers(CharUnits alignment, QualType type)
-    : BlockByrefHelpers(alignment), VarType(type) {}
+  NonTrivialCStructByrefHelpers(const BlockByrefInfo &info, QualType type)
+      : BlockByrefHelpers(info), VarType(type) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2336,7 +2343,7 @@ generateByrefCopyHelper(CodeGenFunction &CGF, const BlockByrefInfo &byrefInfo,
     // Create a scope with an artificial location for the body of this function.
   auto AL = ApplyDebugLocation::CreateArtificial(CGF);
 
-  if (generator.needsCopy()) {
+  if (generator.needsCopy() && !generator.ForceNoopCopy) {
     // dst->x
     Address destField = CGF.GetAddrOfLocalVar(&Dst);
     destField = Address(CGF.Builder.CreateLoad(destField), byrefInfo.Type,
@@ -2458,18 +2465,13 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
 
   auto &byrefInfo = getBlockByrefInfo(&var);
 
-  // The alignment we care about for the purposes of uniquing byref
-  // helpers is the alignment of the actual byref value field.
-  CharUnits valueAlignment =
-    byrefInfo.ByrefAlignment.alignmentAtOffset(byrefInfo.FieldOffset);
-
   if (const CXXRecordDecl *record = type->getAsCXXRecordDecl()) {
     const Expr *copyExpr =
         CGM.getContext().getBlockVarCopyInit(&var).getCopyExpr();
     if (!copyExpr && record->hasTrivialDestructor()) return nullptr;
 
     return ::buildByrefHelpers(
-        CGM, byrefInfo, CXXByrefHelpers(valueAlignment, type, copyExpr));
+        CGM, byrefInfo, CXXByrefHelpers(byrefInfo, type, copyExpr));
   }
 
   // If type is a non-trivial C struct type that is non-trivial to
@@ -2477,7 +2479,7 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
   if (type.isNonTrivialToPrimitiveDestructiveMove() == QualType::PCK_Struct ||
       type.isDestructedType() == QualType::DK_nontrivial_c_struct)
     return ::buildByrefHelpers(
-        CGM, byrefInfo, NonTrivialCStructByrefHelpers(valueAlignment, type));
+        CGM, byrefInfo, NonTrivialCStructByrefHelpers(byrefInfo, type));
 
   // Otherwise, if we don't have a retainable type, there's nothing to do.
   // that the runtime does extra copies.
@@ -2499,7 +2501,7 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
     // byref routines.
     case Qualifiers::OCL_Weak:
       return ::buildByrefHelpers(CGM, byrefInfo,
-                                 ARCWeakByrefHelpers(valueAlignment));
+                                 ARCWeakByrefHelpers(byrefInfo));
 
     // ARC __strong __block variables need to be retained.
     case Qualifiers::OCL_Strong:
@@ -2507,13 +2509,13 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
       // transfer possible.
       if (type->isBlockPointerType()) {
         return ::buildByrefHelpers(CGM, byrefInfo,
-                                   ARCStrongBlockByrefHelpers(valueAlignment));
+                                   ARCStrongBlockByrefHelpers(byrefInfo));
 
       // Otherwise, we transfer ownership of the retain from the stack
       // to the heap.
       } else {
         return ::buildByrefHelpers(CGM, byrefInfo,
-                                   ARCStrongByrefHelpers(valueAlignment));
+                                   ARCStrongByrefHelpers(byrefInfo));
       }
     }
     llvm_unreachable("fell out of lifetime switch!");
@@ -2533,7 +2535,7 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
     flags |= BLOCK_FIELD_IS_WEAK;
 
   return ::buildByrefHelpers(CGM, byrefInfo,
-                             ObjectByrefHelpers(valueAlignment, flags));
+                             ObjectByrefHelpers(byrefInfo, flags));
 }
 
 Address CodeGenFunction::emitBlockByrefAddress(Address baseAddr,
@@ -2653,11 +2655,39 @@ const BlockByrefInfo &CodeGenFunction::getBlockByrefInfo(const VarDecl *D) {
   info.FieldOffset = varOffset;
   info.ByrefAlignment = std::max(varAlign, getPointerAlign());
 
+  // If we're initializing directly on the heap, then we should emit a
+  // no-op copy helper, both because we don't need a real one (the
+  // object will never move), and because a real one would break the
+  // pre-init _Block_object_assign.
+  info.ForceNoopCopy = D->needsInitOnHeap();
+
   auto pair = BlockByrefInfos.insert({D, info});
   assert(pair.second && "info was inserted recursively?");
   return pair.first->second;
 }
 
+void CodeGenFunction::emitByrefInitOnHeap(llvm::Value *P) {
+  // The object itself is initialized directly on the heap.  But for ABI
+  // backwards compatibility reasons, we have to set up a fake byref struct on
+  // the stack (with the structural components initialized but not the object
+  // itself), then call _Block_object_assign to move it to the heap (which is
+  // safe because we forced a no-op copy helper), then call
+  // _Block_object_dispose to release the extra ref from _Block_object_assign.
+  //
+  // 'P' points to the fake byref struct.
+
+  BlockFieldFlags flags = BLOCK_FIELD_IS_BYREF;
+  // Ignored out-parameter.  We'll use the forwarding pointer instead.
+  RawAddress out = CreateDefaultAlignTempAlloca(P->getType(), "initOnHeap.out");
+
+  llvm::Value *args[] = {Builder.CreateBitCast(out.getPointer(), VoidPtrTy),
+                         Builder.CreateBitCast(P, VoidPtrTy),
+                         llvm::ConstantInt::get(Int32Ty, flags.getBitMask())};
+  EmitNounwindRuntimeCall(CGM.getBlockObjectAssign(), args);
+
+  BuildBlockRelease(P, flags, false);
+}
+
 /// Initialize the structural components of a __block variable, i.e.
 /// everything but the actual object.
 void CodeGenFunction::emitByrefStructureInit(const AutoVarEmission &emission) {
@@ -2699,8 +2729,8 @@ void CodeGenFunction::emitByrefStructureInit(const AutoVarEmission &emission) {
   storeHeaderField(V, getPointerSize(), "byref.isa");
 
   // Store the address of the variable into its own forwarding pointer.
-  storeHeaderField(addr.emitRawPointer(*this), getPointerSize(),
-                   "byref.forwarding");
+  llvm::Value *pointer = addr.emitRawPointer(*this);
+  storeHeaderField(pointer, getPointerSize(), "byref.forwarding");
 
   // Blocks ABI:
   //   c) the flags field is set to either 0 if no helper functions are
@@ -2764,6 +2794,9 @@ void CodeGenFunction::emitByrefStructureInit(const AutoVarEmission &emission) {
     auto layoutInfo = CGM.getObjCRuntime().BuildByrefLayout(CGM, type);
     storeHeaderField(layoutInfo, getPointerSize(), "byref.layout");
   }
+
+  if (emission.NeedsInitOnHeap)
+    emitByrefInitOnHeap(pointer);
 }
 
 void CodeGenFunction::BuildBlockRelease(llvm::Value *V, BlockFieldFlags flags,
diff --git a/clang/lib/CodeGen/CGBlocks.h b/clang/lib/CodeGen/CGBlocks.h
index 8d10c4f69b2026..526dbbfd9b38d3 100644
--- a/clang/lib/CodeGen/CGBlocks.h
+++ b/clang/lib/CodeGen/CGBlocks.h
@@ -139,6 +139,7 @@ class BlockByrefInfo {
   unsigned FieldIndex;
   CharUnits ByrefAlignment;
   CharUnits FieldOffset;
+  bool ForceNoopCopy;
 };
 
 /// Represents a type of copy/destroy operation that should be performed for an
diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp
index ce6d6d8956076e..954dd1079215fd 100644
--- a/clang/lib/CodeGen/CGDecl.cpp
+++ b/clang/lib/CodeGen/CGDecl.cpp
@@ -1450,6 +1450,8 @@ CodeGenFunction::EmitAutoVarAlloca(const VarDecl &D) {
 
   bool isEscapingByRef = D.isEscapingByref();
   emission.IsEscapingByRef = isEscapingByRef;
+  emission.IsCapturedByOwnInit = D.isCapturedByOwnInit();
+  emission.NeedsInitOnHeap = D.needsInitOnHeap();
 
   CharUnits alignment = getContext().getDeclAlign(&D);
 
@@ -1692,68 +1694,6 @@ CodeGenFunction::EmitAutoVarAlloca(const VarDecl &D) {
   return emission;
 }
 
-static bool isCapturedBy(const VarDecl &, const Expr *);
-
-/// Determines whether the given __block variable is potentially
-/// captured by the given statement.
-static bool isCapturedBy(const VarDecl &Var, const Stmt *S) {
-  if (const Expr *E = dyn_cast<Expr>(S))
-    return isCapturedBy(Var, E);
-  for (const Stmt *SubStmt : S->children())
-    if (isCapturedBy(Var, SubStmt))
-      return true;
-  return false;
-}
-
-/// Determines whether the given __block variable is potentially
-/// captured by the given expression.
-static bool isCapturedBy(const VarDecl &Var, const Expr *E) {
-  // Skip the most common kinds of expressions that make
-  // hierarchy-walking expensive.
-  E = E->IgnoreParenCasts();
-
-  if (const BlockExpr *BE = dyn_cast<BlockExpr>(E)) {
-    const BlockDecl *Block = BE->getBlockDecl();
-    for (const auto &I : Block->captures()) {
-      if (I.getVariable() == &Var)
-        return true;
-    }
-
-    // No need to walk into the subexpressions.
-    return false;
-  }
-
-  if (const StmtExpr *SE = dyn_cast<StmtExpr>(E)) {
-    const CompoundStmt *CS = SE->getSubStmt();
-    for (const auto *BI : CS->body())
-      if (const auto *BIE = dyn_cast<Expr>(BI)) {
-        if (isCapturedBy(Var, BIE))
-          return true;
-      }
-      else if (const auto *DS = dyn_cast<DeclStmt>(BI)) {
-          // special case declarations
-          for (const auto *I : DS->decls()) {
-              if (const auto *VD = dyn_cast<VarDecl>((I))) {
-                const Expr *Init = VD->getInit();
-                if (Init && isCapturedBy(Var, Init))
-                  return true;
-              }
-          }
-      }
-      else
-        // FIXME. Make safe assumption assuming arbitrary statements cause capturing.
-        // Later, provide code to poke into statements for capture analysis.
-        return true;
-    return false;
-  }
-
-  for (const Stmt *SubStmt : E->children())
-    if (isCapturedBy(Var, SubStmt))
-      return true;
-
-  return false;
-}
-
 /// Determine whether the given initializer is trivial in the sense
 /// that it requires no code to be generated.
 bool CodeGenFunction::isTrivialInitializer(const Expr *Init) {
@@ -1918,8 +1858,7 @@ void CodeGenFunction::EmitAutoVarInit(const AutoVarEmission &emission) {
   // Check whether this is a byref variable that's potentially
   // captured and moved by its own initializer.  If so, we'll need to
   // emit the initializer first, then copy into the variable.
-  bool capturedByInit =
-      Init && emission.IsEscapingByRef && isCapturedBy(D, Init);
+  bool capturedByInit = emission.IsCapturedByOwnInit;
 
   bool locIsByrefHeader = !capturedByInit;
   const Address Loc =
@@ -1951,8 +1890,9 @@ void CodeGenFunction::EmitAutoVarInit(const AutoVarEmission &emission) {
   llvm::Constant *constant = nullptr;
   if (emission.IsConstantAggregate ||
       D.mightBeUsableInConstantExpressions(getContext())) {
-    assert(!capturedByInit && "constant init contains a capturing block?");
     constant = ConstantEmitter(*this).tryEmitAbstractForInitializer(D);
+    assert(!(constant && capturedByInit) &&
+           "constant init contains a capturing block?");
     if (constant && !constant->isZeroValue() &&
         (trivialAutoVarInit !=
          LangOptions::TrivialAutoVarInitKind::Uninitialized)) {
@@ -1975,6 +1915,10 @@ void CodeGenFunction::EmitAutoVarInit(const AutoVarEmission &emission) {
     initializeWhatIsTechnicallyUninitialized(Loc);
     LValue lv = MakeAddrLValue(Loc, type);
     lv.setNonGC(true);
+    if (emission.NeedsInitOnHeap) {
+      drillIntoBlockVariable(*this, lv, &D);
+      capturedByInit = false; // We don't need to drill again.
+    }
     return EmitExprAsInit(Init, &D, lv, capturedByInit);
   }
 
@@ -2023,6 +1967,9 @@ void CodeGenFunction::EmitExprAsInit(const Expr *init, const ValueDecl *D,
     return;
   }
   case TEK_Aggregate:
+    assert(
+        !capturedByInit &&
+        "Capture-by-initializer should have been handled by NeedsInitOnHeap!");
     if (type->isAtomicType()) {
       EmitAtomicInit(const_cast<Expr*>(init), lvalue);
     } else {
@@ -2031,7 +1978,6 @@ void CodeGenFunction::EmitExprAsInit(const Expr *init, const ValueDecl *D,
         Overlap = AggValueSlot::DoesNotOverlap;
       else if (auto *FD = dyn_cast<FieldDecl>(D))
         Overlap = getOverlapForFieldInit(FD);
-      // TODO: how can we delay here if D is captured by its initializer?
       EmitAggExpr(init, AggValueSlot::forLValue(
                             lvalue, *this, AggValueSlot::IsDestructed,
                             AggValueSlot::DoesNotNeedGCBarriers,
@@ -2120,27 +2066,31 @@ void CodeGenFunction::EmitAutoVarCleanups(const AutoVarEmission &emission) {
   // us from jumping into any of these scopes anyway.
   if (!HaveInsertPoint()) return;
 
-  const VarDecl &D = *emission.Variable;
+  // If we're initializing directly on the heap, _Block_object_destroy will
+  // handle destruction, so we don't need to perform cleanups directly.
+  if (!emission.NeedsInitOnHeap) {
+    const VarDecl &D = *emission.Variable;
 
-  // Check the type for a cleanup.
-  if (QualType::DestructionKind dtorKind = D.needsDestruction(getContext()))
-    emitAutoVarTypeCleanup(emission, dtorKind);
+    // Check the type for a cleanup.
+    if (QualType::DestructionKind dtorKind = D.needsDestruction(getContext()))
+      emitAutoVarTypeCleanup(emission, dtorKind);
 
-  // In GC mode, honor objc_precise_lifetime.
-  if (getLangOpts().getGC() != LangOptions::NonGC &&
-      D.hasAttr<ObjCPreciseLifetimeAttr>()) {
-    EHStack.pushCleanup<ExtendGCLifetime>(NormalCleanup, &D);
-  }
+    // In GC mode, honor objc_precise_lifetime.
+    if (getLangOpts().getGC() != LangOptions::NonGC &&
+        D.hasAttr<ObjCPreciseLifetimeAttr>()) {
+      EHStack.pushCleanup<ExtendGCLifetime>(NormalCleanup, &D);
+    }
 
-  // Handle the cleanup attribute.
-  if (const CleanupAttr *CA = D.getAttr<CleanupAttr>()) {
-    const FunctionDecl *FD = CA->getFunctionDec...
[truncated]

Footnotes

  1. Supporting such code appears to be required by the C++ spec (see
    https://timsong-cpp.github.io/cppwp/n4861/dcl.init.aggr#6), though not
    by the C spec (see https://port70.net/~nsz/c/c11/n1570.html#6.7.9p23).

  2. There are other TEK_Aggregate types besides records, but all such
    types either can't be captured by blocks (arrays) or can't be the
    type of a local variable at all.

  3. Though even in C, moving mid-initializer could break code that took the address of the variable:

    struct Foo { void *a; void *(^b)(void); };
    __block struct Foo x = {&amp;x.b, Block_copy(^{ return x.a; })};
    assert(x.a == &amp;x.b); // fails without up-front heap allocation
    

    However, it's a bit less obvious that such code needs to work; perhaps the user should just be aware the variable will be moving. It's not like C++ where construction is clearly an indivisible operation performed at a single address.

@ille-apple
Copy link
Author

Rebased, because the original version failed CI due to a regression on main (#89476).

Copy link

github-actions bot commented Apr 24, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ille-apple
Copy link
Author

Ping :)

Clang special-cases `__block` variables whose initializer contains a
block literal that captures the variable itself.  But this special case
doesn't work for variables of record types, causing Clang to emit broken
code.

Fix this by initializing such variables directly on the heap, and add a
warning for when this needs to be done.

rdar://70498802
@ille-apple
Copy link
Author

Rebased due to merge conflict.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialization related change looks trivial and good. But I feel better to leave the formal approval to CG part reviewers.

@ille-apple
Copy link
Author

Weekly ping.

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks okay, but see the comment about exception safety.

// Otherwise, we transfer ownership of the retain from the stack
// to the heap.
// Otherwise, we transfer ownership of the retain from the stack
// to the heap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation here is correct — it's a continuation of the same thought as the comment before the if.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but the re-indentation came from clang-format (because I modified the code next to it), and not doing it will upset the CI checks.

How about I move the first comment after the if and indent both of them?

      if (type->isBlockPointerType()) {
        // Block pointers need to be copied, and there's no direct
        // transfer possible.
        return ::buildByrefHelpers(CGM, byrefInfo,
                                   ARCStrongBlockByrefHelpers(byrefInfo));
      } else {
        // Otherwise, we transfer ownership of the retain from the stack
        // to the heap.
        return ::buildByrefHelpers(CGM, byrefInfo,
                                   ARCStrongByrefHelpers(byrefInfo));
      }

clang/lib/CodeGen/CGBlocks.cpp Show resolved Hide resolved
RawAddress out = CreateDefaultAlignTempAlloca(P->getType(), "initOnHeap.out");

llvm::Value *args[] = {Builder.CreateBitCast(out.getPointer(), VoidPtrTy),
Builder.CreateBitCast(P, VoidPtrTy),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bitcasts should no longer be necessary, right? Do we have a mode that still generates typed pointers in IR?

If you're trying to handle address-space differences, this is the wrong way to do it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is an artifact of the change having originally been written in 2020. Looks like I can just drop the bitcast. I don't think address-space differences are possible.

@@ -2120,27 +2066,33 @@ void CodeGenFunction::EmitAutoVarCleanups(const AutoVarEmission &emission) {
// us from jumping into any of these scopes anyway.
if (!HaveInsertPoint()) return;

const VarDecl &D = *emission.Variable;
// If we're initializing directly on the heap, _Block_object_destroy will
// handle destruction, so we don't need to perform cleanups directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right way to think of this is that the stack copy of the byref is always uninitialized in this situation, so there's nothing for us to destroy. Basically, we make an uninitialized byref structure on the stack, ask the runtime to move it to the heap with a trivial move function, and then initialize the object on the heap directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed... and looking at this again, I put too much code inside the if statement.

Not only that, there's an existing bug here when combining __attribute__((cleanup)) with __block: if the variable was copied to the heap, the cleanup function will be called after the byref allocation has already been freed.

Will fix.

auto pair = BlockByrefInfos.insert({D, info});
assert(pair.second && "info was inserted recursively?");
return pair.first->second;
}

void CodeGenFunction::emitByrefInitOnHeap(llvm::Value *P) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually doing the initialization, so I think this is misnamed. This is just setting up for the initialization by allocating a byref on the heap.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll change the name to emitByrefHeapAlloc.

@@ -2764,6 +2794,9 @@ void CodeGenFunction::emitByrefStructureInit(const AutoVarEmission &emission) {
auto layoutInfo = CGM.getObjCRuntime().BuildByrefLayout(CGM, type);
storeHeaderField(layoutInfo, getPointerSize(), "byref.layout");
}

if (emission.NeedsInitOnHeap)
emitByrefInitOnHeap(pointer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is not exception-safe, right? If the initialization throws, we have no way to release the byref object we're allocating here.

Now, it looks like the blocks runtime is itself not exception-safe, so maybe we can hand-wave this? _Block_byref_assign_copy allocates, calls the copy operation, and makes no effort to deallocate on a throw. On the other hand, the byref "copy" operation is typically a call to a move constructor, so it is a lot less likely in practice that it'll throw than that the original initialization will.

The only way I can think of to handle this would be to actually put a "successfully initialized" flag in the byref payload and then no-op the destructor if it isn't set. That's implementable but pretty awful. So maybe we should just acknowledge it's not exception-safe and suggest that the blocks runtime add new APIs to support this properly. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, you're right.

In principle the BLOCK_HAS_COPY_DISPOSE flag could be used for that purpose, instead of creating a separate field in the payload. It could start out unset (making the runtime skip the call to byref_dispose), and then be set after initialization.

But the initialization might send the block capturing the object to another thread. If initialization then throws, I assume that nobody will subsequently call the block, or if they do, the block won't actually try to use the object. That's straightforward UB. But it doesn't seem like it should be UB just to send the block. So there might be concurrent calls to _Block_object_dispose. Not only that, 'our' _Block_object_dispose call (the one that would hypothetically be added on the cleanup path to fix the memory leak) might not be the one to drop the last reference.

Therefore, flags would have to be modified atomically, particularly because the same word is also used for the reference count. And even with an atomic operation, you would still have theoretical UB in the blocks runtime when it reads flags non-atomically. Though, not any more UB than it's already incurring thanks to potential concurrent refcount modifications... But what if an alternate block runtime tries to cache flags? The only alternate runtime I know of is GNUstep's and it looks like it doesn't cache it, but this still feels very dubious.

So then there's the alternative you suggested of just acknowledging that this isn't exception-safe (specifically, that it leaks memory if initialization throws). But that makes this whole patch feel like an awkward middle ground. If self-capturing __block variables are not going to be supported properly (pending new APIs), then maybe they shouldn't be supported at all and should just be an error (pending new APIs).

But if new APIs are added, teaching Clang how to recognize when the target OS supports them would be quite a lot of work for what is ultimately an edge case.

I'm going to see how ugly it ends up being to add a "successfully initialized" flag.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to add the flag. It's not all that bad.

In fact, if, as I suggested in my last comment, we want to support the byref lingering after unwinding, then there must be some kind of runtime state – to track whether whoever drops the last reference needs to run the destructor. It would be more natural for the Blocks runtime to track that state rather than relying on compiler-generated code, but the state has to live somewhere.

If, on the other hand, we don't want to support the byref lingering after unwinding, then I could go back to the idea of mutating the existing flags field. In that scenario, the existence of runtime state would be unnecessary, but the implementation would be relatively unobtrusive.

- Revamp heap-init dispose helper to use a conditional flag.

- Fix __attribute__((cleanup)).

- Revert the changes from the first commit in this PR that involve
  copying ForceNoopCopy to BlockByrefHelpers.  Upon review, this turned
  out to be unnecessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants