Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CodeGen][BugFix] use proper base type for derived class #404

Merged
merged 3 commits into from
Jan 31, 2024

Conversation

gitoleg
Copy link
Collaborator

@gitoleg gitoleg commented Jan 18, 2024

In the original codegen a new type is created for the base class, while in CIR we were rewriting the type being processed (due tp misused pointers). This PR fix this, and also makes CIR codegen even with the original one.

@bcardosolopes
Copy link
Member

Thanks for working on this.

In the original codegen a new type is created for the base class, while in CIR we were rewriting the type being processed (due tp misused pointers). This PR fix this, and also makes CIR codegen even with the original one.

I had several problems with the pointers/pointees in that section of the code in the past, I had some fun with an ASANified version of clang in order to figure out. So I'm a bit scared with this change (let me see if I can find the previous commits in question).

However, that was before @sitio-couto changed the StructType to be mutable, so perhaps the problems I fixed in the past are also gone because of his change. I like the end result of this change, which is keeping the base class type around. @sitio-couto since you touched this last, thoughts?

Copy link
Collaborator

@sitio-couto sitio-couto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcardosolopes @gitoleg I'm not sure I agree with this change.

we were rewriting the type being processed (due tp misused pointers).

@gitoleg, can you clarify this comment? I'm concerned because StructType shouldn't change post-completion (see CIRGenBuilder::getCompleteStructTy and StructType::complete).

In the original codegen a new type is created for the base class

Regarding the base class's new type in the original codegen, I thought it was just for alignment (e.g., example). With this change, there's a lack of padding info for class.A.base, and seems premature. Maybe It would be better suited at the lowering phase. Overall, this seems to just replicate a struct type without any clear advantages. @gitoleg am I missing something?

@bcardosolopes while I don't see many benefits here, I don't know much about codegen for virtual methods/classes, so take it with a grain of salt.

clang/test/CIR/CodeGen/derived-to-base.cpp Outdated Show resolved Hide resolved
@gitoleg
Copy link
Collaborator Author

gitoleg commented Jan 22, 2024

well, I will do my best to explain!
let's start from the original code in order to understand what's going on and what I meant by "misused pointer"

std::unique_ptr<CGRecordLayout>
CodeGenTypes::ComputeRecordLayout(const RecordDecl *D, llvm::StructType *Ty) {
  ...
1  llvm::StructType *BaseTy = nullptr;
2  if (isa<CXXRecordDecl>(D)) {
3    BaseTy = Ty;
4    if (Builder.Layout.getNonVirtualSize() != Builder.Layout.getSize()) {
5      CGRecordLowering BaseBuilder(*this, D, /*Packed=*/Builder.Packed);
6      BaseBuilder.lower(/*NonVirtualBaseType=*/true);
7      BaseTy = llvm::StructType::create(
8          getLLVMContext(), BaseBuilder.FieldTypes, "", BaseBuilder.Packed);
       ....
    }
  }

line №1: Base type pointer is initialized to nullptr
line №3: Base type pointer is assigned to the pointer to a type of a structure being processed, i.e. these pointer point to the same types
line №7: Base type pointer assigned again to a newly created type. So now Ty and BaseTy are different pointers! and they point to the different types as well.

now let's take a look at our code

CIRGenTypes::computeRecordLayout(const RecordDecl *D,
                                 mlir::cir::StructType *Ty) {
  ...
1  mlir::cir::StructType *BaseTy = nullptr;
2  if (llvm::isa<CXXRecordDecl>(D) && !D->isUnion() &&  !D->hasAttr<FinalAttr>()) {
3    BaseTy = Ty;
4    if (builder.astRecordLayout.getNonVirtualSize() !=  builder.astRecordLayout.getSize()) {
5      CIRRecordLowering baseBuilder(*this, D, /*Packed=*/builder.isPacked);
6      auto baseIdentifier = getRecordTypeName(D, ".base");
7      *BaseTy =
          Builder.getCompleteStructTy(baseBuilder.fieldTypes, baseIdentifier,
                                      /*packed=*/false, D);
  ....
10     *Ty =
             Builder.getCompleteStructTy(builder.fieldTypes, getRecordTypeName(D, ""),
                                  /*packed=*/false, D);
  ....

line №1: Base type pointer is initialized to nullptr (same)
line №3: Base type pointer assigned to the pointer to a type of a structure being processed, i.e. these pointer point to the same types (same)
line №7: An object pointed by BaseTy (i.e. the same type Ty) is assigned to a newly created type. Now BaseTy and Ty are the same types.
line №10: An object pointed by Ty (which is base type at this moment) is assigned to a newly created type again which erases the previously created base type.

Finally. Yes, (as far as I understand), base class is used for alignment, when the class itself need to be padded with zeros and derived classes don't need this padding. But it's the next story - this PR is just bug fixing and I have the PR that adds all the unimplemented functions like insertPadding and etc.

@sitio-couto I don't know if I explained clear enough, so please let me know if I did not :)

@sitio-couto
Copy link
Collaborator

@gitoleg thanks for the in-depth explanation! Now I understand (nice catch BTW)!

But it's the next story - this PR is just bug fixing and I have the PR that adds all the unimplemented functions like insertPadding and etc.

Shouldn't the new buildBase.lower call be in this "next story" then? It seems like the fix and the codegen of the A.base type are two different things.

Yes, (as far as I understand), base class is used for alignment, when the class itself need to be padded with zeros and derived classes don't need this padding.

In this case, have you considered deferring this to the lowering phase? I don't see why we would need this low-level information in CIR. It might even complicate passes since they will have to consider different versions of the same class type. @bcardosolopes seems like a case of early optimization, WDYT?

@bcardosolopes
Copy link
Member

Finally. Yes, (as far as I understand), base class is used for alignment, when the class itself need to be padded with zeros and derived classes don't need this padding. But it's the next story - this PR is just bug fixing and I have the PR that adds all the unimplemented functions like insertPadding and etc.

@gitoleg thanks for the detailed explanation!

Shouldn't the new buildBase.lower call be in this "next story" then? It seems like the fix and the codegen of the A.base type are two different things.

The fact that we were missing the BaseBuilder.lower(/*NonVirtualBaseType=*/true); doesn't sound intentional (sounds more like a bug to me).

In this case, have you considered deferring this to the lowering phase? I don't see why we would need this low-level information in CIR. It might even complicate passes since they will have to consider different versions of the same class type. @bcardosolopes seems like a case of early optimization, WDYT?

That's an interesting point, I like the idea. However, if we go this direction, I believe we need to change StructType to have some extra mechanism to track padding more explicitly, because if we ask datalayout questions in a later pass (like for sizes of struct and stuff), we need it to give it the right answer even though it's before LoweringPrepare. Attributes would probably be one way to tag the struct with extra info. In LoweringPrepare we could then transform this attribute into actual members, so it matches whatever LLVM expects. Thoughts?

Anyways, I think this PR should be good to land before we decide how the other bits are getting introduced. Any objections @sitio-couto?

@sitio-couto
Copy link
Collaborator

Any objections @sitio-couto?

@bcardosolopes none.

Thoughts?

Sounds good.

@lanza lanza force-pushed the main branch 2 times, most recently from 4e069c6 to 79d4dc7 Compare January 29, 2024 23:34
@bcardosolopes
Copy link
Member

@lanza just rebased and this has the side effect of breaking GH PR workflow, making it impossible to review, apologies. Please rebase!

@gitoleg
Copy link
Collaborator Author

gitoleg commented Jan 31, 2024

@lanza just rebased and this has the side effect of breaking GH PR workflow, making it impossible to review, apologies. Please rebase!

@bcardosolopes done!

@bcardosolopes bcardosolopes merged commit 0cef361 into llvm:main Jan 31, 2024
5 of 6 checks passed
lanza pushed a commit that referenced this pull request Mar 23, 2024
In the original codegen a new type is created for the base class, while
in CIR we were rewriting the type being processed (due tp misused
pointers). This PR fix this, and also makes CIR codegen even with the
original one.
eZWALT pushed a commit to eZWALT/clangir that referenced this pull request Mar 24, 2024
In the original codegen a new type is created for the base class, while
in CIR we were rewriting the type being processed (due tp misused
pointers). This PR fix this, and also makes CIR codegen even with the
original one.
lanza pushed a commit that referenced this pull request Apr 29, 2024
In the original codegen a new type is created for the base class, while
in CIR we were rewriting the type being processed (due tp misused
pointers). This PR fix this, and also makes CIR codegen even with the
original one.
lanza pushed a commit that referenced this pull request Apr 29, 2024
In the original codegen a new type is created for the base class, while
in CIR we were rewriting the type being processed (due tp misused
pointers). This PR fix this, and also makes CIR codegen even with the
original one.
eZWALT pushed a commit to eZWALT/clangir that referenced this pull request Apr 29, 2024
In the original codegen a new type is created for the base class, while
in CIR we were rewriting the type being processed (due tp misused
pointers). This PR fix this, and also makes CIR codegen even with the
original one.
lanza pushed a commit that referenced this pull request Apr 29, 2024
In the original codegen a new type is created for the base class, while
in CIR we were rewriting the type being processed (due tp misused
pointers). This PR fix this, and also makes CIR codegen even with the
original one.
bruteforceboy pushed a commit to bruteforceboy/clangir that referenced this pull request Oct 2, 2024
In the original codegen a new type is created for the base class, while
in CIR we were rewriting the type being processed (due tp misused
pointers). This PR fix this, and also makes CIR codegen even with the
original one.
Hugobros3 pushed a commit to shady-gang/clangir that referenced this pull request Oct 2, 2024
In the original codegen a new type is created for the base class, while
in CIR we were rewriting the type being processed (due tp misused
pointers). This PR fix this, and also makes CIR codegen even with the
original one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants