-
-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray #2891
[ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray #2891
Conversation
integration_tests/matmul_01.f90
Outdated
@@ -142,7 +142,7 @@ program matmul_01 | |||
real :: err | |||
|
|||
! Use n = 960 for a good benchmark | |||
n = 96 | |||
n = 960 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This takes ~5 sec using LFortran
[...]
671/671 Test #651: matmul_01_FAST ......................... Passed 4.92 sec
[...]
GFortran:
[...]
814/814 Test #791: matmul_01 .......................... Passed 0.75 sec
[...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's too long, make the test finish in under 100ms. For benchmarking we'll change the lengths by hand. Ah I see, that's when you change it manually. All ok then.
Yes, we'll make it faster for this case, to ensure our design is correct.
integration_tests/matmul_01.f90
Outdated
@@ -142,7 +142,7 @@ program matmul_01 | |||
real :: err | |||
|
|||
! Use n = 960 for a good benchmark | |||
n = 96 | |||
n = 960 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n = 960 | |
n = 96 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Change the n=96. Other than that, I think this looks good, thanks! |
436ab45
to
fa99a5a
Compare
I think the reference tests need to be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from reference tests, it looks fine to me. Thanks for this.
src/libasr/pass/fma.cpp
Outdated
@@ -123,6 +123,9 @@ class FMAVisitor : public PassUtils::SkipOptimizationFunctionVisitor<FMAVisitor> | |||
} | |||
|
|||
void visit_Assignment(const ASR::Assignment_t& x) { | |||
if (ASRUtils::is_simd_array(x.m_target)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we skip the pass currently. @Thirumalai-Shaktivel Do you know if there is any fma
operation for vectors
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I need to check the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I need to check the documentation.
Ok, please let us know as soon as there is any update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fma is allowed for vector registers on x86 as well as other platforms.
We should represent it in ASR as "fma", then in the backend generate appropriate instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, For LLVM we can use llvm.fma.v3f32
.
For C backend, we can use immintrin.h
for x86 but it is not available for M1, so I decided to let the C compiler handle it.
fa99a5a
to
8442bf3
Compare
@@ -1218,7 +1218,6 @@ R"( // Initialise Numpy | |||
We need to generate: | |||
a = {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}; | |||
*/ | |||
CHECK_FAST_C(compiler_options, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other visitors have it. Could you share why we need to remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider simd_01
as an example, for which we visit ArrayConstant
while using --fast
option and create the following function call:
a = (float __attribute__ (( vector_size(sizeof(float) * 8) ))) array_constant_r32dim(8, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00)
I think this shouldn't be applied for SIMDArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we have to make sure that ArrayBroadcast
must be visited in the backend only for SIMDArray assignment.
We have to add a TODO or create an issue for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it seems to be slightly concerning as other visitors use it and the ArrayBroadcast is not using it. Is there any other way to implement this so that ArrayBroadcast similar to other visitors also uses the CHECK_FAST_C()
?
If it is complicated, then maybe we can support it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a design decision that you don't know the answer to, create an issue, describe the problem, etc.
And let's merge some solution so that --fast
just works. Then we can iterate on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a function and assigning a value (calling the function CHECK_FAST_C
does it) is slower than the direct assignment. Also, it might be not possible to know the array type as ArrayConstant
will always be a FixedSizeArray
.
In LLVM also, we don't use the m_value
.
8442bf3
to
483dfb8
Compare
Ready! |
483dfb8
to
63076d8
Compare
Thanks for the approvals; if there is any issue, we will fix it iteratively! |
Fixes: #2886
Fixes: #2887