Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray #2891

Merged
merged 7 commits into from
Nov 24, 2023

Conversation

Thirumalai-Shaktivel
Copy link
Member

Fixes: #2886
Fixes: #2887

@@ -142,7 +142,7 @@ program matmul_01
real :: err

! Use n = 960 for a good benchmark
n = 96
n = 960
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This takes ~5 sec using LFortran

[...]
671/671 Test #651: matmul_01_FAST .........................   Passed    4.92 sec
[...]

GFortran:

[...]
814/814 Test #791: matmul_01 ..........................   Passed    0.75 sec
[...]

Copy link
Contributor

@certik certik Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's too long, make the test finish in under 100ms. For benchmarking we'll change the lengths by hand. Ah I see, that's when you change it manually. All ok then.

Yes, we'll make it faster for this case, to ensure our design is correct.

@@ -142,7 +142,7 @@ program matmul_01
real :: err

! Use n = 960 for a good benchmark
n = 96
n = 960
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
n = 960
n = 96

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@certik certik marked this pull request as draft November 22, 2023 04:47
@certik
Copy link
Contributor

certik commented Nov 22, 2023

Change the n=96. Other than that, I think this looks good, thanks!

@Shaikh-Ubaid
Copy link
Member

I think the reference tests need to be updated.

Copy link
Member

@Shaikh-Ubaid Shaikh-Ubaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from reference tests, it looks fine to me. Thanks for this.

@@ -123,6 +123,9 @@ class FMAVisitor : public PassUtils::SkipOptimizationFunctionVisitor<FMAVisitor>
}

void visit_Assignment(const ASR::Assignment_t& x) {
if (ASRUtils::is_simd_array(x.m_target)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we skip the pass currently. @Thirumalai-Shaktivel Do you know if there is any fma operation for vectors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I need to check the documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I need to check the documentation.

Ok, please let us know as soon as there is any update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fma is allowed for vector registers on x86 as well as other platforms.

We should represent it in ASR as "fma", then in the backend generate appropriate instructions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, For LLVM we can use llvm.fma.v3f32.
For C backend, we can use immintrin.h for x86 but it is not available for M1, so I decided to let the C compiler handle it.

@@ -1218,7 +1218,6 @@ R"( // Initialise Numpy
We need to generate:
a = {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0};
*/
CHECK_FAST_C(compiler_options, x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other visitors have it. Could you share why we need to remove this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider simd_01 as an example, for which we visit ArrayConstant while using --fast option and create the following function call:

    a = (float __attribute__ (( vector_size(sizeof(float) * 8) ))) array_constant_r32dim(8,   1.00000000000000000e+00,   1.00000000000000000e+00,   1.00000000000000000e+00,   1.00000000000000000e+00,   1.00000000000000000e+00,   1.00000000000000000e+00,   1.00000000000000000e+00,   1.00000000000000000e+00)

I think this shouldn't be applied for SIMDArray.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we have to make sure that ArrayBroadcast must be visited in the backend only for SIMDArray assignment.
We have to add a TODO or create an issue for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it seems to be slightly concerning as other visitors use it and the ArrayBroadcast is not using it. Is there any other way to implement this so that ArrayBroadcast similar to other visitors also uses the CHECK_FAST_C()?

If it is complicated, then maybe we can support it in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a design decision that you don't know the answer to, create an issue, describe the problem, etc.

And let's merge some solution so that --fast just works. Then we can iterate on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a function and assigning a value (calling the function CHECK_FAST_C does it) is slower than the direct assignment. Also, it might be not possible to know the array type as ArrayConstant will always be a FixedSizeArray.

In LLVM also, we don't use the m_value.

@Thirumalai-Shaktivel Thirumalai-Shaktivel marked this pull request as draft November 23, 2023 04:07
@Thirumalai-Shaktivel Thirumalai-Shaktivel added the asr pass Issue or pull request specific to ASR pass label Nov 23, 2023
@Thirumalai-Shaktivel
Copy link
Member Author

Ready!

@Thirumalai-Shaktivel
Copy link
Member Author

Thanks for the approvals; if there is any issue, we will fix it iteratively!

@Thirumalai-Shaktivel Thirumalai-Shaktivel changed the title [ASR Pass] Skip SIMDArray's in FMA pass [ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray Nov 24, 2023
@Thirumalai-Shaktivel Thirumalai-Shaktivel merged commit cb439e5 into lfortran:main Nov 24, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asr pass Issue or pull request specific to ASR pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

matmul_01 fails for --fast in both the C and LLVM backend matmul_01 fails for n = 960 using GFortran
4 participants