Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CIRGen] Add complex type and its CIRGen support #513

Merged
merged 1 commit into from
Jul 3, 2024

Conversation

Lancern
Copy link
Collaborator

@Lancern Lancern commented Mar 17, 2024

This PR adds !cir.complex type to model the _Complex type in C. It also contains support for its CIRGen.

In detail, this patch adds the following CIR types, ops, and attributes:

  • The !cir.complex type is added to model the _Complex type in C. This type is parameterized with the type of the components of the complex number, which must be either an integer type or a floating-point type.
  • The #cir.complex attribute is added to represent a literal value of _Complex type. It is a struct-like attribute that provides the real and imaginary part of the literal _Complex value.
  • The #cir.imag attribute is added to represent a purely imaginary number.
  • The cir.complex.create op is added to create a complex value from its real and imaginary parts.
  • The cir.complex.real and cir.complex.imag op is added to extract the real and imaginary part of a value of !cir.complex type, respectively.
  • The cir.complex.real_ptr and cir.complex.imag_ptr op is added to derive a pointer to the real and imaginary part of a value of !cir.complex type, respectively.

CIRGen support for some of the fundamental complex number operations is also included. Note the implementation diverges from the original clang CodeGen, where expressions of complex types are handled differently from scalars and aggregates. Instead, this patch treats expressions of complex types as scalars, as such expressions can be simply lowered to a CIR value of !cir.complex type.

This PR addresses #445 .

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, thanks for working on it!

Comments inline but overall I'm just a bit confused by the fact that LLVM skeleton isn't being much followed here (and some comments on CIRGen decisions).

clang/include/clang/CIR/Dialect/IR/CIROps.td Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenCXX.cpp Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenClass.cpp Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp Outdated Show resolved Hide resolved
clang/lib/CIR/Dialect/IR/CIRAttrs.cpp Outdated Show resolved Hide resolved
clang/lib/CIR/Dialect/IR/CIRTypes.cpp Show resolved Hide resolved
clang/test/CIR/CodeGen/complex.c Outdated Show resolved Hide resolved
clang/test/CIR/CodeGen/complex.c Outdated Show resolved Hide resolved
clang/include/clang/CIR/Dialect/IR/CIROps.td Outdated Show resolved Hide resolved
@Lancern
Copy link
Collaborator Author

Lancern commented Mar 18, 2024

@bcardosolopes Hi Bruno, thanks for your review! Let me explain in more details about why I'm not following the skeleton here.

The original clang CodeGen divide expressions into three categories according to their types: 1) scalar, 2) complex, and 3) aggregate. CodeGen handles these three categories of expressions differently:

  • CodeGen of a scalar expression yields a llvm::Value * that represent the prvalue result of the expression. Note that scalar expressions are always prvalue.
  • CodeGen of a complex expression yields a std::pair<llvm::Value *, llvm::Value *> that represent the prvalue result of the expression. The two llvm::Value * represent the real part and the imaginary part of the result complex value, respectively. Note that complex expressions are also always prvalue.
  • CodeGen of an aggregate expression does not yield anything. Instead, a destination address must be given to the CodeGen and the aggregate produced by the expression will be emitted into that location.

So the CodeGen of scalar expressions and complex expressions are actually very similar. I believe the reason why CodeGen distinguishes them is that CodeGen of scalar expressions yields llvm::Value * while CodeGen of complex expressions yields std::pair<llvm::Value *, llvm::Value *>.

OK, that's enough of background story. In CIR, what makes a difference is that instead of using something like std::pair<mlir::Value, mlir::Value> to represent a complex value, we now have the !cir.complex type which allows us to use a single mlir::Value to represent a complex value. In other words, in CIR, if we keep following the skeleton:

  • CIRGen of a scalar expression yields a mlir::Value that represents the prvalue result.
  • CIRGen of a complex expression also yields a mlir::Value that represents the prvalue result.
  • CIRGen of an aggregate expression keeps what it is.

If we follow the skeleton, the code in CIRGenExprScalar.cpp and CIRGenExprComplex.cpp (which hasn't been invented) would be very very similar to each other, resulting in unnecessary redundancy. Thus I decided to treat expressions of complex types as scalar expressions in CIRGen. This eases the implementation greatly and it still follow the intuition.

I hope this explanation is clear enough for you! If you are still confused please let me know.

@bcardosolopes
Copy link
Member

bcardosolopes commented Mar 19, 2024

Thanks for the detailed explanation (very clarifying) and for thinking about code duplication!

If we follow the skeleton, the code in CIRGenExprScalar.cpp and CIRGenExprComplex.cpp (which hasn't been invented) would be very very similar to each other, resulting in unnecessary redundancy. Thus I decided to treat expressions of complex types as scalar expressions in CIRGen. This eases the implementation greatly and it still follow the intuition.

clang/lib/CodeGen/CGExprComplex.cpp is quite big. Are you confident that current missing complex codegen will just apply with all scalar stuff? I have the feeling that as soon as something differs, we'll need to change scalar emitters to take complex types into account, and because we're assuming scalar handles it all, we won't have asserts to catch what's not currently implemented - so might end up with bad codegen which will take time to investigate and debug.

I believe duplication will lead us to converge faster and have less issues. How about you add a FIXME(cir): unlike traditional CodeGen, this shares common code with ScalarExprEmitter to every method that is the same between ComplexExprEmitter and ScalarExprEmitter, so that when we complete complex support, we can decide if there's a refactoring approach we could take?

@lanza lanza force-pushed the main branch 2 times, most recently from ed3955a to 8b7417c Compare March 23, 2024 05:07
@Lancern
Copy link
Collaborator Author

Lancern commented Mar 23, 2024

How about you add a FIXME(cir): unlike traditional CodeGen, this shares common code with ScalarExprEmitter to every method that is the same between ComplexExprEmitter and ScalarExprEmitter, so that when we complete complex support, we can decide if there's a refactoring approach we could take?

OK I believe this is also acceptable and it allows us to evaluate this divergence more deeply before we can make further decisions. I'll move the complex CIRGen code to a new CIRGenComplex.cpp file and keep the upstream code structure for now.

@Lancern
Copy link
Collaborator Author

Lancern commented Mar 23, 2024

Rebased onto the latest main, but not yet started to work on the reviews.

@Lancern Lancern marked this pull request as draft March 24, 2024 15:11
@Lancern Lancern marked this pull request as ready for review March 29, 2024 14:30
@bcardosolopes
Copy link
Member

@Lancern sorry, I was out on vacation. Let me know when you address all reviews and update the PR.

@Lancern
Copy link
Collaborator Author

Lancern commented Apr 6, 2024

@bcardosolopes Hi Bruno, I have updated the implementation so that it now follows the upstream skeleton.

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. This is in the right direction, thanks the updates. More comments inline!

clang/include/clang/CIR/Dialect/IR/CIROps.td Outdated Show resolved Hide resolved
clang/include/clang/CIR/Dialect/IR/CIROps.td Outdated Show resolved Hide resolved
clang/include/clang/CIR/Dialect/IR/CIROps.td Outdated Show resolved Hide resolved
clang/include/clang/CIR/Dialect/IR/CIROps.td Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenCXX.cpp Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenExprComplex.cpp Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenDecl.cpp Outdated Show resolved Hide resolved
clang/include/clang/CIR/Dialect/IR/CIRAttrs.td Outdated Show resolved Hide resolved
clang/include/clang/CIR/Dialect/IR/CIROps.td Outdated Show resolved Hide resolved
clang/test/CIR/CodeGen/complex.c Outdated Show resolved Hide resolved
@bcardosolopes
Copy link
Member

Let me know once this is ready for review again!

@bcardosolopes
Copy link
Member

@lanza rebased over the weekend, sorry for the churn. Can you rebase this PR again?

@Lancern
Copy link
Collaborator Author

Lancern commented May 12, 2024

Rebased onto the latest main.

@Lancern Lancern marked this pull request as ready for review May 14, 2024 14:39
@Lancern
Copy link
Collaborator Author

Lancern commented May 14, 2024

@bcardosolopes Hi Bruno, I believe this PR is now ready for another round of review.

I decide to keep this PR simpler (but the size of the PR is still quite big). In the latest version I made the following adjustments:

  • I replaced the #cir.complex attribute with the #cir.imag attribute. The former represents a whole complex number, while the latter represents a complex number that is purely imaginary (i.e. its real part is 0), which can be written in C as a number literal followed by the i suffix (e.g. 3.0i will become #cir.imag<#cir.fp<3.0>>).
  • I replaced the cir.complex.real and cir.complex.imag with cir.complex.get_real and cir.complex.get_imag, respectively. The main difference here is the get_* version yields pointers to the corresponding complex number component just like what cir.get_member is doing. This change is to support the __real__ and __imag__ operator which yield lvalues.
  • I removed a lot of code in CGExprComplex.cpp and left most visitors as unimplemented for now. Only a few fundamental complex number operations are supported in this PR so code review can be easier, including:
    • List initialization of complex numbers that initializes both of the real part and the imaginary part;
    • Imaginary literals;
    • Load and store of complex number values;
    • __real__ and __imag__ operator.

All supported operations are tested in clang/test/CIR/CodeGen/complex.c . Support for more complex number operations can be added in future PRs.

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decide to keep this PR simpler (but the size of the PR is still quite big).

Thanks, I appreciate that.

I replaced the #cir.complex attribute with the #cir.imag attribute. The former represents a whole complex number, while the latter represents a complex number that is purely imaginary (i.e. its real part is 0), which can be written in C as a number literal followed by the i suffix (e.g. 3.0i will become #cir.imag<#cir.fp<3.0>>).

After looking the PR, I'm not sure we need #cir.imag, it should just be an int or fp attribute directly.

//      CHECK:   %[[#REAL:]] = cir.const #cir.fp<1.000000e+00> : !cir.double
// CHECK-NEXT:   %[[#IMAG:]] = cir.const #cir.fp<2.000000e+00> : !cir.double
// CHECK-NEXT:   %{{.+}} = cir.complex.create %[[#REAL]], %[[#IMAG]] : !cir.double -> !cir.complex<!cir.double>

This composes better with what cir.complex.create ends up doing (building imaginary out of regulaer fp/int directly).

I replaced the cir.complex.real and cir.complex.imag with cir.complex.get_real and cir.complex.get_imag, respectively. The main difference here is the get_* version yields pointers to the corresponding complex number component just like what cir.get_member is doing. This change is to support the real and imag operator which yield lvalues.

I'll address this in another comment.

I removed a lot of code in CGExprComplex.cpp and left most visitors as unimplemented for now. Only a few fundamental complex number operations are supported in this PR so code review can be easier, including

Awesome!

All supported operations are tested in clang/test/CIR/CodeGen/complex.c . Support for more complex number operations can be added in future PRs.

Can you please add LLVM IR support and test it altogether in complex.c?

clang/test/CIR/CodeGen/complex.c Outdated Show resolved Hide resolved
clang/test/CIR/CodeGen/complex.c Show resolved Hide resolved

void imag_literal(VOID) {
c = 3.0i;
ci = 3i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here in imag_literal()

    %0 = cir.const #cir.imag<#cir.fp<3.000000e+00> : !cir.double> : !cir.complex<!cir.double>
    %1 = cir.get_global @c : !cir.ptr<!cir.complex<!cir.double>>
    cir.store %0, %1 : !cir.complex<!cir.double>, !cir.ptr<!cir.complex<!cir.double>>

How do you plan to distinguish if the cir.store is storing to real or imag part when lowering this? Relying on the imag attribute sounds brittle. Another reason to add LLVM lowering as part of the work, it'd have popped up this kind of question / design issue. This should be using the same mechanism as elsewhere:

    %0 = cir.const #cir.imag<#cir.fp<3.000000e+00> : !cir.double> : !cir.complex<!cir.double>
    %1 = cir.get_global @c : !cir.ptr<!cir.complex<!cir.double>>
    %3 = cir.complex.get_img %1 : !cir.ptr<!cir.complex<!cir.double>> -> !cir.ptr<!cir.double>
    cir.store %0, %3 : !cir.complex<!cir.double>, !cir.ptr<!cir.complex<!cir.double>>

Please rename cir.complex.get_img to cir.complex.imag_ptr, cir.complex.get_real to cir.complex.real_ptr.

Copy link
Collaborator Author

@Lancern Lancern Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I have removed #cir.imag the CIRGen scheme under discussion here now becomes:

c = 3.0i;
// %0 = cir.const #cir.fp<3.000000e+00> : !cir.double
// %1 = cir.get_global @c : !cir.ptr<!cir.complex<!cir.double>>
// %2 = cir.complex.get_img %1 : !cir.ptr<!cir.complex<!cir.double>> -> !cir.ptr<!cir.double>
// cir.store %0, %2 : !cir.complex<!cir.double>, !cir.ptr<!cir.complex<!cir.double>>

However this may be a bit hard to implement during CIRGen since this assignment is basically an assignment between two complex values. The upstream CodeGen tracks the real and imaginary parts of a complex value individually so it can generate such code directly. In CIR we may use some canonicalization passes to do this job?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a lot has changed since, let's ignore my first comment on this sub-thread. For the test on c = 3.0i, we now get:

//      CHECK: %[[#REAL:]] = cir.const #cir.fp<0.000000e+00> : !cir.double
// CHECK-NEXT: %[[#IMAG:]] = cir.const #cir.fp<3.000000e+00> : !cir.double
// CHECK-NEXT: %{{.+}} = cir.complex.create %[[#REAL]], %[[#IMAG]] : !cir.double -> !cir.complex<!cir.double>

I'm assuming that the result of cir.complex.create is then stored to the global, right? If so, this seems good enough already. Am I missing something?

However this may be a bit hard to implement during CIRGen since this assignment is basically an assignment between two complex values. The upstream CodeGen tracks the real and imaginary parts of a complex value individually so it can generate such code directly.

Can you elaborate on what upstream CodeGen does? I'm not sure I follow.

In CIR we may use some canonicalization passes to do this job?

It's always a good option, not sure for this specific example there's much we could do though (I guess it depends on the reply above).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that the result of cir.complex.create is then stored to the global, right? If so, this seems good enough already.

Yes. The result of cir.complex.create is then stored into the global.

Can you elaborate on what upstream CodeGen does?

Let's say we're trying to CodeGen the following assignment statement:

// int _Complex c;
c = 3i;

The statement contains two sub-expressions: the lhs is an lvalue that indicates the global variable c, and the rhs is a prvalue that represents a value of type int _Complex. To CodeGen the assignment statement, the CodeGen module will recursively CodeGen these two sub-expressions and then emit some store instructions to actually do the assignment.

When CodeGen-ing the rhs, the sub-expression CodeGen will yield a pair of LLVM values std::pair<llvm::Value *, llvm::Value *> to the statement CodeGen. This is how the upstream CodeGen represent a prvalue of complex type: the first element of the pair is the real part of the complex prvalue, and the second element is the imaginary part of the complex prvalue. When the statement CodeGen gets the pair of values, it then emits two store instructions to store the real and imaginary part into the global variable c.

In CIR, we take a different approach for CIRGen-ing prvalues of complex types. Instead of yielding a pair of values, CIRGen just yields a single mlir::Value that represents the whole complex value. Thus it would be more natural to successively generate a single store instruction to store the whole complex value into the global variable, rather than storing the real and imaginary parts separately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side note here: the literal 3i is a complex value whose real part is implicitly 0. Thus one cannot just CodeGen / CIRGen the storing of the imaginary part; the storing of the implicit real part is also necessary.

@Lancern Lancern marked this pull request as draft June 17, 2024 15:20
@Lancern
Copy link
Collaborator Author

Lancern commented Jun 17, 2024

After looking the PR, I'm not sure we need #cir.imag, it should just be an int or fp attribute directly.

I prefer keeping the #cir.complex to be used directly with cir.const, it composes well with the rest of CIR consts.

Hi Bruno, thanks for your reviews here. The problem with #cir.complex is that there is no literal expression in C that corresponds to it directly. One may think that the initializer list expression {real, imag} should be lowered to a #cir.complex but this could only happen when real and imag are both constant expressions. Otherwise you have no choice but fallback to cir.complex.create. Of course we could make {real, imag} lower to cir.const #cir.complex by trying to constant-evaluate the two elements during CIRGen.

#cir.imag, on the other hand, corresponds to the literal expression 1i, 2i, etc. In clang these literal expressions are of complex type and this is exactly what I'm trying to model in CIR here.

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for revamping this, let's try to land it sooner than later. There are a few comments that seem like they didn't get address, can you update the ones that still need tackling versus the ones you consider done (I just solved a bunch but got unsure about few others).

Hi Bruno, thanks for your reviews here. The problem with #cir.complex is that there is no literal expression in C that corresponds to it directly. One may think that the initializer list expression {real, imag} should be lowered to a #cir.complex but this could only happen when real and imag are both constant expressions. Otherwise you have no choice but fallback to cir.complex.create. Of course we could make {real, imag} lower to cir.const #cir.complex by trying to constant-evaluate the two elements during CIRGen.

Ok, I was thinking we could have a fold operation of sorts to canonicalize this, but it could come as a next step after we land this.

#cir.imag, on the other hand, corresponds to the literal expression 1i, 2i, etc. In clang these literal expressions are of complex type and this is exactly what I'm trying to model in CIR here.

I understand that and I think it looks nice. However, I don't see any advantage of this representation brings - it's not even used in the tests (at least not tested AFAICT) - can you perhaps make the case? Perhaps it's something that can be later introduced if it proves itself worth for some more concrete reason?

@Lancern
Copy link
Collaborator Author

Lancern commented Jun 18, 2024

Rebased onto the latest main.

@Lancern Lancern marked this pull request as ready for review June 18, 2024 15:18
@Lancern
Copy link
Collaborator Author

Lancern commented Jun 18, 2024

I agree that #cir.imag looks inconsistent and not promising. Removed it from this PR, and now imaginary literals such as 1i is lowered to three ops: two cir.const and one cir.complex.create:

int _Complex c = 3i;
// %0 = cir.const #cir.int<0> : !s32i
// %1 = cir.const #cir.int<3> : !s32i
// %c = cir.complex.create %0, %1 : !s32i -> !cir.complex<!s32i>

Besides, updated #cir.zero so that it can be used for creating zero values of complex type:

int _Complex c;
// cir.global external @c = #cir.zero : !cir.complex<!s32i>

@Lancern
Copy link
Collaborator Author

Lancern commented Jun 18, 2024

Ok, I was thinking we could have a fold operation of sorts to canonicalize this

This looks nice. I'll add this in another PR once this PR lands!

@bcardosolopes
Copy link
Member

This looks nice. I'll add this in another PR once this PR lands!

Sounds good!

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hard work here, I'm glad to see this landing.

LGTM pending any new information from question in the comments!

clang/include/clang/CIR/Dialect/IR/CIROps.td Outdated Show resolved Hide resolved

void imag_literal(VOID) {
c = 3.0i;
ci = 3i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a lot has changed since, let's ignore my first comment on this sub-thread. For the test on c = 3.0i, we now get:

//      CHECK: %[[#REAL:]] = cir.const #cir.fp<0.000000e+00> : !cir.double
// CHECK-NEXT: %[[#IMAG:]] = cir.const #cir.fp<3.000000e+00> : !cir.double
// CHECK-NEXT: %{{.+}} = cir.complex.create %[[#REAL]], %[[#IMAG]] : !cir.double -> !cir.complex<!cir.double>

I'm assuming that the result of cir.complex.create is then stored to the global, right? If so, this seems good enough already. Am I missing something?

However this may be a bit hard to implement during CIRGen since this assignment is basically an assignment between two complex values. The upstream CodeGen tracks the real and imaginary parts of a complex value individually so it can generate such code directly.

Can you elaborate on what upstream CodeGen does? I'm not sure I follow.

In CIR we may use some canonicalization passes to do this job?

It's always a good option, not sure for this specific example there's much we could do though (I guess it depends on the reply above).

@Lancern
Copy link
Collaborator Author

Lancern commented Jun 23, 2024

Rebased onto the latest main.

This patch adds an initial support for the C complex type, i.e. `_Complex`. It
introduces the following new types, attributes, and operations:

- `!cir.complex`, which represents the C complex number type;
- `cir.complex.create`, which creates a complex number from its real and
  imaginary parts;
- `cir.complex.real_ptr`, which derives a pointer to the real part of a complex
  number given a pointer to the complex number;
- `cir.complex.imag_ptr`, which derives a pointer to the imaginary part of a
  complex number given a pointer to the complex number.

CIRGen for some basic complex number operations is also included in this patch.
@Lancern
Copy link
Collaborator Author

Lancern commented Jun 29, 2024

Rebased onto the latest main.

@bcardosolopes
Copy link
Member

Pending replies to the questions asked.

@Lancern
Copy link
Collaborator Author

Lancern commented Jul 3, 2024

Pending replies to the questions asked.

It seems to me that all questions should be resolved? Am I missing some questions? GitHub does not show my response to your latest question in the thread, it's here: #513 (comment)

@bcardosolopes
Copy link
Member

It seems to me that all questions should be resolved? Am I missing some questions? GitHub does not show my response to your latest question in the thread, it's here: #513 (comment)

Perfect, got confused by github! All seems good then, thanks for all the hard work here!

@bcardosolopes bcardosolopes merged commit ec94476 into llvm:main Jul 3, 2024
6 checks passed
@Lancern Lancern deleted the complex-type branch July 4, 2024 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants