-
-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C] Handle BinOp for SIMDArray #2836
[C] Handle BinOp for SIMDArray #2836
Conversation
392e616
to
eef5b18
Compare
Is this ready for review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine. I don't know if this is ready for review.
Yup, this is ready! |
I'm marking this as a draft for now. For implementing SIMD assignment using vector extensions. |
I made some research and came across the following: [...]
!LF$ attributes simd :: A
real :: A(4)
A = 23.
! or
A = i ! value: `23.`
[...] We can do: // C code
[...]
A = (float __attribute__ (( vector_size(sizeof(float) * 4) ))){23., 23., 23., 23.}
// or
A = (float __attribute__ (( vector_size(sizeof(float) * 4) ))){i, i, i, i}
// or
A = i - (float __attribute__ (( vector_size(sizeof(float) * 4) ))){ }
// See for more details: https://stackoverflow.com/a/43801280/15913193
[...] I thought of doing the third one. Array initialisation: [...]
!LF$ attributes simd :: A
real :: A(4), C(8)
C = 3.
A = C(:4)
[...] We can do // C code
[...]
float a __attribute__ (( vector_size(sizeof(float) * 4) ));
struct r32 c_value;
struct r32* c = &c_value;
float c_data[8];
c->data = c_data;
c->n_dims = 1;
c->dims[0].lower_bound = 1;
c->dims[0].length = 8;
memcpy(&a, c->data, sizeof(float) * 4);
[...] @certik do yo know any other builtin functions that we can use here? |
Also, should we predefine the types as the following in typedef float v8float __attribute__ ((vector_size (32))); /* float[8], AVX */
typedef double v4double __attribute__ ((vector_size (32))); /* double[4], AVX */
typedef float v4float __attribute__ ((vector_size (16))); /* float[4], SSE */ |
Do you mean in our C++ code? Not sure. ' I would not predefine |
for (__1_t=0; __1_t<=7; __1_t++) { | ||
a[__1_t] = (float)(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the intention? I am confused, the older version feels correct. This loop is just normal C code, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. I think we should ideally avoid accessing individual vector elements and leave it upto the processor to handle vector operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, Yea, I thought the same. I'm working on it and will push the changes soon.
eef5b18
to
ea2227f
Compare
Current design: !LF$ attributes simd :: A
real :: A(4), C(8)
real :: i = 12.
A = i C Backend a = (float __attribute__ (( vector_size(sizeof(float) * 4) ))) {i, i, i, i}; Fortran example A = 1.2 C Backend a = (float __attribute__ (( vector_size(sizeof(float) * 4) ))) { 1.19999999999999996e+00, 1.19999999999999996e+00, 1.19999999999999996e+00, 1.19999999999999996e+00}; Fortran example C = 42
A = C C Backend memcpy(&a, c->data, sizeof(float) * 4); Fortran example A = C(2:) C Backend memcpy(&a, c->data + (2 - c->dims[0].lower_bound), sizeof(float) * 4); |
2dfcd8c
to
6b592a7
Compare
Ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is fine. However, I think only the simd version is being tested, not the non-simd version, correct? I think we need to update our cmake tester to test both versions.
src/libasr/codegen/asr_to_c.cpp
Outdated
} | ||
array_const_str += "}"; | ||
src = "(" + cast + ") " + array_const_str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do cast here? This seems something like hard coded, where as the previous one where we actually insert the cast while visiting the visit_ArrayPhysicalCast()
is more general I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ArrayBroadcast
should only be used for SIMDArray
, otherwise throw error.
And for SIMDArray
we always need a cast, so I moved it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a case, where this operation has to be performed in visit_ArrayPhysicalCast
. If it requires we will use this there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ArrayBroadcast should only be used for SIMDArray
I think it can be used for regular arrays as well. At the moment, we only support array broadcast for SIMD arrays at the C backend level. I would keep the implementation generalised as before instead of hardcoding.
And for SIMDArray we always need a cast, so I moved it here.
I think we can't say that a cast would always be needed. Consider if we are broadcasting an SIMD
array. Then I think a cast is not needed here.
I think the backend should just be completely "dumb" and just follow what the ASR says. If the ASR has a cast node, then the backend adds the cast operation. If there is no cast node in the ASR, then there should be no cast generated by the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be used for regular arrays as well.
Nope, the ArrayBroadcast design is that we handle all the ArrayBroadcast in the array_op itself except for the SIMDArray case, i.e., we shouldn't be visiting ArrayBroadcast in the backends except for SIMDArray
Regarding the Cast, I had some problem with adding the cast in visit_ArrayPhysicalCast
so I moved them here, I will look into it and report back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider if we are broadcasting an SIMD array. Then I think a cast is not needed here.
I think SIMDArray would have an assignment here:
(=
(Var 4 a)
(Var 4 b)
()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't be visiting ArrayBroadcast in the backends except for SIMDArray
Nope, the ArrayBroadcast design is that we handle all the ArrayBroadcast in the array_op itself except for the SIMDArray case, i.e., we shouldn't be visiting ArrayBroadcast in the backends except for SIMDArray
Yes, I know the array_op
pass handles the ArrayBroadcast
. But think it from the general perspective. In general, I think the ArrayBroadCast is meant for all types of arrays. If you plan to support only simd
for now or in future, I would add an LCompilersAssert(is_simd_array())
(or anything similar) so that it indicates that only simd
arrays are supported and helps us catch unexpected bugs in ArrayBroadcast
(For example when some other type of array gets passed to ArrayBroadcast.)
I think SIMDArray would have an assignment here:
Consider a
with length 128
and b
with length 256
and we have b = a
. I think here we need to broadcast a
so that it matches the length of b
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we wouldn't be able to add a check LCompilersAssert(is_simd_array())
in ArrayBroadcast
, as ArrayBroadcast type will always be a FixedArraySize
. Maybe it might be possible in visit_Assignment
or visit_ArrayPhysicalCast
.
Consider a with length 128 and b with length 256 and we have b = a. I think here we need to broadcast a so that it matches the length of b.
We shouldn't allow the assignment of different size array, right?
GFortran throws an error for it:
$ gfortran examples/expr2.f90 && ./a.out
examples/expr2.f90:6:0:
6 | y = x
|
Error: Different shape for array assignment at (1) on dimension 1 (256 and 128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fine for now. We can also add the assert later as the design evolves.
} else if (ASRUtils::is_simd_array(x.m_v)) { | ||
index += src; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would ideally not implement array item for SIMD array until utmost necessary. I think we should avoid using it as much as possible, ideally never use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I used this for the print statement, just for debugging. If this is not required, I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we introduce an option, like --print-simd
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @certik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we introduce an option, like --print-simd?
I do not have any opinion on this. I would just do best to avoid as many if(is_simd_array())
, then
, else
as possible, so that the code is still clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would should "--print-simd" do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was, Should we print SIMDArray?
print_arr
pass converts the SIMDArray to a do_loop to print the values, should we allow it?
print *, a
--show-fortran
do __1_k = lbound(a, 1), ubound(a, 1)
print *, a(__1_k)
end do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src = "((" + result_type + ")" + var_name + "->dims[" + idx + "-1].lower_bound)"; | ||
if (ASRUtils::is_simd_array(x.m_v)) { | ||
src = "0"; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think lower_bound
for an SIMD array in not meaningful. I am unsure, but I think this case should never be triggered for SIMD arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, used for printing the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, no opinion on this. Mostly likely it would not get triggered, so seems like a dead code to me at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I thought of removing them for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep this for further implementation of SIMDArray, as it would be helpful for debugging.
I will create an issue to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please post a link of the issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contributions, Thirumalai. I shared some comments above. Overall, it looks good to me. |
Looking at the changes in the backend, it seems (and as we expected) there are some/several I think as we dive more, the design might get clearer. |
6b592a7
to
9d28f26
Compare
What needs to be done to finish this? |
9d28f26
to
0ec0d61
Compare
res = A + B | ||
C(:4) = res | ||
res = A * B | ||
C(5:) = res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a print *, C
here (before the if
assert) so that it is helpful for debugging later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useless for the CI, so let's not do it. The developer can always add a print *, C
and test it, but not git stage it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useless for the CI, so let's not do it. The developer can always add a print *, C and test it, but not git stage it.
I think the tests are not just for the CI but also to help developers debug an issue (when any arises). I think there is no disadvantage in printing value on the console, but I think there are advantages:
- Helps developers debug the test (without making changes to it)
- Most important: Ensures that the test actually runs and produces values. The produced values can be seen by the developers on the console. Previously I have experienced that we had some tests which have asserts (and no prints) and these asserts never got run or get triggered because the function that would test them got removed by the
unused_functions
pass. So, the test would pass (because there is no checking being done) and the developer gets the false impression that the test works correctly. I think a print in this would have been very helpful in avoiding such situation. Since the developer would have known in the first place when adding such test as he would have noticed no value being printed on the console (because even the print would be removed as the function itself is removed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know it would print an output on the console on failure. In that case, I think we can add a print statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/libasr/codegen/asr_to_c_cpp.h
Outdated
} else { | ||
value += "->data"; | ||
} | ||
} else if (ASR::is_a<ASR::Var_t>(*x.m_target)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this instead be ?
} else if (ASR::is_a<ASR::Var_t>(*x.m_target)) { | |
} else if (ASR::is_a<ASR::Var_t>(*x.m_value)) { |
Also, based on the if
-else
conditions, I think the above case is unused currently (or is like a dead code)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Could you specify what part was updated? I can see no change in the above if
statement. Did you push your changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, it seems updated. Thanks.
I think reference tests need to be updated. |
0ec0d61
to
68d80db
Compare
@Thirumalai-Shaktivel Could you also check if the above changes work with LPython by submitting a PR? |
68d80db
to
920dbb1
Compare
@Shaikh-Ubaid, you can do a final review now. Yup, I was planning to do the same, after the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems good to me. Thanks for this.
If these changes work with LPython, I think it is good to merge.
LPython: lcompilers/lpython#2425 |
920dbb1
to
a4098ef
Compare
Thanks for the review! |
It seems this PR is merged, but the related PR is still open (I think it might be waiting for approval). Please merge the PR and its related PRs together or at similar times (ideally when there is approval on both/all the PRs). This will help keep the |
Towards #2293