-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CIR][CodeGen] Inline assembly: store the results #512
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Oleg, great to see this getting completed, nice! Few requests but nothing major.
My syntax suggestion:
%3 = cir.asm(x86_att, operands=[%2 : !u32i], attrs = [#cir.optnone]
{"addl $$42, $0 \0A\09 subl $$1, $0 \0A\09 imul $$2, $0" "=r,0,~{dirflag},~{fpsr},~{flags}"}
) side_effects -> !u32i
Basically:
- Add operands into a list like output
- Flip the current order
- Add a newline + indentation for the "textual" body (only if easy to implement).
Do what you can to keep this in tablegen, we probably don't need to go implement this in C++. Do we have verification that the list of attrs matches the number of operands? Also, please add a way to check for the operands in the tests, plain {{.*}}
is hiding important info. Also let me know if I'm reading how this works correctly.
There's one testcase that I'm not sure I understood:
float add5(float x, float y) {
__asm__("fadd %[x], %[y]"
: [x] "=&t" (x)
: [y] "f" (y)
);
return x;
}
Comes out as:
cir.func @add5(%arg0: !cir.float loc(fused[#loc74, #loc75]), %arg1: !cir.float loc(fused[#loc76, #loc77])) -> !cir.float extra(#fn_attr) {
%0 = cir.alloca !cir.float, cir.ptr <!cir.float>, ["x", init] {alignment = 4 : i64} loc(#loc380)
%1 = cir.alloca !cir.float, cir.ptr <!cir.float>, ["y", init] {alignment = 4 : i64} loc(#loc381)
%2 = cir.alloca !cir.float, cir.ptr <!cir.float>, ["__retval"] {alignment = 4 : i64} loc(#loc73)
cir.store %arg0, %0 : !cir.float, cir.ptr <!cir.float> loc(#loc78)
cir.store %arg1, %1 : !cir.float, cir.ptr <!cir.float> loc(#loc78)
%3 = cir.load %1 : cir.ptr <!cir.float>, !cir.float loc(#loc79)
%4 = cir.asm(x86_att, {"fadd $0, $1" "=&{st},f,~{dirflag},~{fpsr},~{flags}"})
operand_attrs = [#cir.optnone] %3 : (!cir.float) -> !cir.float loc(#loc80)
cir.store %4, %0 : !cir.float, cir.ptr <!cir.float> loc(#loc81)
It seems that x
is not being passed to the computation? it's never loaded but only stored.
I'm also a bit curious about t2
:
void t2(unsigned long long t) {
__asm__ volatile("" : "+m"(t));
}
...
cir.asm(x86_att, {"" "=*m,*m,~{dirflag},~{fpsr},~{flags}"})
operand_attrs = [#cir.optnone, !u64i, !u64i]
side_effects %0, %0 : (!cir.ptr<!u64i>, !cir.ptr<!u64i>) -> () loc(#loc97)
I'm a bit lost: (a) #cir.optnone
is for which operand? (b) What does it mean to have two types .., !u64i, !u64i
in operand_attrs
?
@bcardosolopes done
No, we don't. Do we need to? so far just
Well, it turned out that the test itself is wrong. I added two equal numbers and decided that float addition works. Looks like in the real life everything is a bit more complex - I mean from assembly point of view. So I fixed this test - and now both values are loaded and a proper sum is computed.
(a) Did I answered all the questions? please let me know if I missed anything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Oleg, thanks for your patience!
No, we don't. Do we need to? so far just assert added
I've seen in the testcases that they don't match, perhaps I don't understand how they work then. But it's confusing to read. Can you clarify?
(a) #cir.optnone is for return value operand - it was a comment about - lowering from llvm dialect to LLVM IR needs it. But you're right - it's better to add it in the lowering part, not in the codegen one
I don't understand how optnone applies to a result, are you talking about the right attribute? Can you point me to where in clang/test/CIR/Lowering/asm.cir
you have that in the LLVM output?
(b) operand attributes used for LLVM IR call instruction - otherwise it can not be verified: Operand for indirect constraint must have elementtype attribute.
My question is more towards understanding what it means, when I read cir.asm
s in the testcases, I have no clue what some of the constructs are supposed to mean. How can we make it more obvious? Could they be inferred?
Did I answered all the questions? please let me know if I missed anything
I'm still trying to wrap my head around the cir.asm
output. Left more comments inline.
clang/test/CIR/CodeGen/asm.c
Outdated
//CHECK: cir.asm(x86_att, {"" "=*m,*m,~{dirflag},~{fpsr},~{flags}"}) operand_attrs = [#cir.optnone, !s32i, !s32i] side_effects %0, %0 : (!cir.ptr<!s32i>, !cir.ptr<!s32i>) -> () | ||
void t1(int x) { | ||
// CHECK: cir.asm(x86_att, operands = [%0, %0 : !cir.ptr<!s32i>, !cir.ptr<!s32i>] attrs = [!s32i, !s32i] | ||
// CHECK: {"" "=*m,*m,~{dirflag},~{fpsr},~{flags}"}) side_effects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain to me what's going on here for every piece of this operation?
I didn't look at this for couple weeks and now I can't understand what's going on, maybe this is telling something about making this easier to read. I can't make a connection between operands
and attrs
, or why in the operands list we have a %0
without a type, then we have another one with a type, etc - not to mention the number of elements seem arbitrary and hard to correlate back. What are the types in attrs
supposed to give? Can't they be inferred?
It's very hard to read these operations and get the meaning out of it, perhaps inline asm is just that hard, but I rather we be verbose and make it easier to understand.
clang/test/CIR/CodeGen/asm.c
Outdated
// CHECK: [[TMP0:%.*]] = cir.alloca !s32i, cir.ptr <!s32i>, ["a"] | ||
// CHECK: [[TMP1:%.*]] = cir.load %0 : cir.ptr <!u32i>, !u32i | ||
// CHECK: [[TMP2:%.*]] = cir.asm(x86_att, operands = [[[TMP1]] : !u32i] attrs = [#cir.optnone] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you got rid of them, what #cir.optnone
here means then?
@bcardosolopes done with First of all, it's easy to confuse operands to C asm instruction and operands to CIR operation.
Given that in gerenal the inline assembly follows the next format :
No, they both have a type: here we wrote it in Attributes.The number of attributes should be equal to the number of operands. The only attribute we use - is a type attribute and it has a meaning of element type, i.e. it's again refers to an interaction with a memory. Not all the operands has this attribute. For instance, if we change the constraint in the example to
I didn't get rid of them. I got rid of the attribute that related to the result and add it in the lowering. Previously the number of attributes may or may not be equal to the number of operands, depending if the instruction in question has a result of not.
There is a shift for the case when the result exists. And previously I handled it in CIR by placing extra
Well, we infer them in
where the second member of the return value is out future attribute. Sort of summary
Yes, it's really hard to understand! and hard to read! I also have a hard time when I need to read the inline assembly, how it's represented in CIR, how it's lowered into dialect, how it's later lowered to LLVM IR ... And I would move everything to the lowering, but it looks like it's not easy to do - we need to have a full access to all the ast expressions and statements involved. well ... Let's just agree on the following:
Please, don't hesitate to ask more questions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, nice! Sorry this is taking long. I have more questions/comments cause I'm still having some trouble to decode.
first we list operands and then operands types. So we have operands = [%0, %0 : !cir.ptr<!s32i>, !cir.ptr<!s32i>].
Is my reading below correct?
operands = [%0, // output operand. where does it infers the type from?
%0 : !cir.ptr<!s32i>, // input operand with type.
!cir.ptr<!s32i> // type of the constraint?
].
This all seems like it's encoding things in a smart way to make it easier to unpack later, but looking into it without understanding how the parser works is very hard. Can you make the output look something we can distingish between input operands, reuse_input/output operands and constraints? I'd like users to look at it and not have to understand how this is parsed, we should probably be more verbose if necessary. Perhaps better we agree on how it looks like before you change code, can you suggest something? I'm afraid I don't even understand the proper order to suggest something.
There is a shift for the case when the result exists.
It's very smart, but very hard to figure out.
And previously I handled it in CIR by placing extra optnone in the list of operands, now it's done in the lowering
#cir.optnone
exists for marking functions to do not be optimized by the compiler. I understand why you want to use it, but it's not the right attribute to use - overall we shouldn't also create an attribute to indicate empty space. I think we should change the syntax as to not need this.
there are operands of the CIR operation. And the number of operands of operations may be different from the number of operands of C asm instruction you see in some source code.
Ok. My take: it's fine if it's different, but it'd be great if we can look at the operation and know what the operands are without having to understand the parser logic - split into more arrays, tie some of the properties together? It's fine if the parser/printing has to become a bit more complex.
there are attributes to the CIR operation's operands. Some operands need attributes,
Ok. My take: I'd like us to be able to look at such attributes and easily identify what operands they refer to. And would like to also understand which operand is also output (for when it repeats, perhaps should be a different list?)
some of them don't. The latter are attributed as cir.optnone
Like mentioned above about cir.optnone
.
Not really, sorry :) Forget about the fact it's a list of operands for inline assembly. It's just a list of mlir values( I admire your desire to make it more readable. And I think it's time to make in C++ code. I suggest we do the following.
Thoughts? It will take a time though but will make everything readable in the same time! UPD: more or less done with custom printing for the |
Thanks for the fast reply.
Oh, interesting, this makes sense. I didn't realize we're doing that with I like your suggestions, I suggest few minor tweaks:
I believe we are converging! Few questions remaining:
|
@bcardosolopes done!
So far we have one attribute per operand. And it's kind of design choice based on the llvm dialect counterpart. Once we'll need something else - we will need to update the Speaking about this |
This is fine for now, it was just out of curiosity.
Suggestion: before we create the instruction, we (1) get the mlir::Type of Btw, looking at the LLVM docs:
Seems like the motivation there is because LLVM ptrs are opaque, so it's possible we don't run into the problem. |
yes, that makes sense! And I checked - you are right, it seems we can infer an element type from operand type.
So we need to invent something instead. Thoughts? |
Are you talking about the element type attribute or about other attributes as well? My suggestion here was to get rid of element type, but if there are other attributes that are needed, we should thread them for sure. Another one I was opposed to was
I guess I need to understand how the meaning differs between the two examples to suggest something. We could add an attribute for the element type thing, but I rather see it being described with the property it holds instead of "elementtype", which has opaque meaning for people working on the CIR level. By looking at the C code, what's the property that tells me whether I want to later have "elementtype" in the LLVM IR? |
Yes, the only attribute I'm talking about is the element type. I understand your desire to remove it from CIR (and I would glad to) and add it in the lowering only.
I don't use it anymore, and use just Now let's focus on the problem: we need to understand how to add attributes in the lowering, i.e. for each operand we want to decide if we need to do add such attribute.
Basically, we infer if the operand needs this attribute from the constraints. But the rules how to do it are not the trivial ones. So I provide several examples (assuming Example 1. No element type attribute
Example 2. One element type attribute
Example 3. Two element type attributes
the same as in the second example, but Example 4 Break the logic. One element type attribute. Again.
SummaryThus, in order to infer whether we need to add the attribute for the operand or not in the lowering, we need to repeat this logic. So, please let me know what is on your mind. |
Thanks for the write up, I think I get it now :)
The approach from the examples you have sound good, no need to go try another approach. So this is my suggestion based on the discussion so far and on the new examples you provided (let me know if they make sense):
Another example:
Note I added options without and with the type being inferred (one idea is to assert in CIRGen if those differ, and then we can better elaborate on the attribute defintion). I also suggest Thoughts? |
@bcardosolopes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, very happy to see this landing. Thanks for your patience going over all of this and explaining to me all the annoying little bits :)
LGTM
This PR adds storing of the results of inline assembly operation. This is a **final** step (I hope: ) ) from my side to support inline assembly. There are some features that remains unimplemented, but basic things should work now, For example, we can do addition and get the results - I explicitly added several tests for that, so you can test them in real. For instance, the next program being compiled with CIR should give you 7 as the result: ``` int add(int x, int y) { int a; __asm__("addl %[y], %[x]" : "=r" (a) : [x] "r" (x), [y] "r" (y) ); return a; } int main() { printf("run %d\n", add(3, 4)); return 0; } ``` So, the main thing remains is pretty printing. As I said I added several examples, and may be it will become more clear how to print better. Also, I added several tests from original codegen in order to check that we don't fail. And I can add some checks there as well when we come to better solution on printing.
This PR adds storing of the results of inline assembly operation. This is a **final** step (I hope: ) ) from my side to support inline assembly. There are some features that remains unimplemented, but basic things should work now, For example, we can do addition and get the results - I explicitly added several tests for that, so you can test them in real. For instance, the next program being compiled with CIR should give you 7 as the result: ``` int add(int x, int y) { int a; __asm__("addl %[y], %[x]" : "=r" (a) : [x] "r" (x), [y] "r" (y) ); return a; } int main() { printf("run %d\n", add(3, 4)); return 0; } ``` So, the main thing remains is pretty printing. As I said I added several examples, and may be it will become more clear how to print better. Also, I added several tests from original codegen in order to check that we don't fail. And I can add some checks there as well when we come to better solution on printing.
This PR adds storing of the results of inline assembly operation. This is a **final** step (I hope: ) ) from my side to support inline assembly. There are some features that remains unimplemented, but basic things should work now, For example, we can do addition and get the results - I explicitly added several tests for that, so you can test them in real. For instance, the next program being compiled with CIR should give you 7 as the result: ``` int add(int x, int y) { int a; __asm__("addl %[y], %[x]" : "=r" (a) : [x] "r" (x), [y] "r" (y) ); return a; } int main() { printf("run %d\n", add(3, 4)); return 0; } ``` So, the main thing remains is pretty printing. As I said I added several examples, and may be it will become more clear how to print better. Also, I added several tests from original codegen in order to check that we don't fail. And I can add some checks there as well when we come to better solution on printing.
This PR adds storing of the results of inline assembly operation. This is a **final** step (I hope: ) ) from my side to support inline assembly. There are some features that remains unimplemented, but basic things should work now, For example, we can do addition and get the results - I explicitly added several tests for that, so you can test them in real. For instance, the next program being compiled with CIR should give you 7 as the result: ``` int add(int x, int y) { int a; __asm__("addl %[y], %[x]" : "=r" (a) : [x] "r" (x), [y] "r" (y) ); return a; } int main() { printf("run %d\n", add(3, 4)); return 0; } ``` So, the main thing remains is pretty printing. As I said I added several examples, and may be it will become more clear how to print better. Also, I added several tests from original codegen in order to check that we don't fail. And I can add some checks there as well when we come to better solution on printing.
This PR adds storing of the results of inline assembly operation. This is a **final** step (I hope: ) ) from my side to support inline assembly. There are some features that remains unimplemented, but basic things should work now, For example, we can do addition and get the results - I explicitly added several tests for that, so you can test them in real. For instance, the next program being compiled with CIR should give you 7 as the result: ``` int add(int x, int y) { int a; __asm__("addl %[y], %[x]" : "=r" (a) : [x] "r" (x), [y] "r" (y) ); return a; } int main() { printf("run %d\n", add(3, 4)); return 0; } ``` So, the main thing remains is pretty printing. As I said I added several examples, and may be it will become more clear how to print better. Also, I added several tests from original codegen in order to check that we don't fail. And I can add some checks there as well when we come to better solution on printing.
This PR adds storing of the results of inline assembly operation.
This is a final step (I hope: ) ) from my side to support inline assembly.
There are some features that remains unimplemented, but basic things should work now, For example, we can do addition and get the results - I explicitly added several tests for that, so you can test them in real.
For instance, the next program being compiled with CIR should give you 7 as the result:
So, the main thing remains is pretty printing. As I said I added several examples, and may be it will become more clear how to print better.
Also, I added several tests from original codegen in order to check that we don't fail. And I can add some checks there as well when we come to better solution on printing.