[ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray #2891

Thirumalai-Shaktivel · 2023-11-22T03:30:25Z

Fixes: #2886
Fixes: #2887

Thirumalai-Shaktivel · 2023-11-22T03:32:08Z

integration_tests/matmul_01.f90

@@ -142,7 +142,7 @@ program matmul_01
 real :: err

 ! Use n = 960 for a good benchmark
-n = 96
+n = 960


Note: This takes ~5 sec using LFortran

[...] 671/671 Test #651: matmul_01_FAST ......................... Passed 4.92 sec [...]

GFortran:

[...] 814/814 Test #791: matmul_01 .......................... Passed 0.75 sec [...]

That's too long, make the test finish in under 100ms. For benchmarking we'll change the lengths by hand. Ah I see, that's when you change it manually. All ok then.

Yes, we'll make it faster for this case, to ensure our design is correct.

certik · 2023-11-22T04:47:01Z

integration_tests/matmul_01.f90

@@ -142,7 +142,7 @@ program matmul_01
 real :: err

 ! Use n = 960 for a good benchmark
-n = 96
+n = 960


Suggested change

n = 960

n = 96

certik · 2023-11-22T04:47:52Z

Change the n=96. Other than that, I think this looks good, thanks!

Shaikh-Ubaid · 2023-11-22T05:44:33Z

I think the reference tests need to be updated.

Shaikh-Ubaid

Apart from reference tests, it looks fine to me. Thanks for this.

Shaikh-Ubaid · 2023-11-22T05:51:01Z

src/libasr/pass/fma.cpp

@@ -123,6 +123,9 @@ class FMAVisitor : public PassUtils::SkipOptimizationFunctionVisitor<FMAVisitor>
    }

    void visit_Assignment(const ASR::Assignment_t& x) {
+        if (ASRUtils::is_simd_array(x.m_target)) {


It seems we skip the pass currently. @Thirumalai-Shaktivel Do you know if there is any fma operation for vectors?

Nope, I need to check the documentation.

Nope, I need to check the documentation.

Ok, please let us know as soon as there is any update.

Yes, fma is allowed for vector registers on x86 as well as other platforms.

We should represent it in ASR as "fma", then in the backend generate appropriate instructions.

Yup, For LLVM we can use llvm.fma.v3f32.
For C backend, we can use immintrin.h for x86 but it is not available for M1, so I decided to let the C compiler handle it.

Shaikh-Ubaid · 2023-11-22T06:30:13Z

src/libasr/codegen/asr_to_c.cpp

@@ -1218,7 +1218,6 @@ R"(    // Initialise Numpy
            We need to generate:
            a = {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0};
        */
-        CHECK_FAST_C(compiler_options, x)


Other visitors have it. Could you share why we need to remove this?

Consider simd_01 as an example, for which we visit ArrayConstant while using --fast option and create the following function call:

a = (float __attribute__ (( vector_size(sizeof(float) * 8) ))) array_constant_r32dim(8, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00, 1.00000000000000000e+00)

I think this shouldn't be applied for SIMDArray.

Also, we have to make sure that ArrayBroadcast must be visited in the backend only for SIMDArray assignment.
We have to add a TODO or create an issue for it.

Ok, it seems to be slightly concerning as other visitors use it and the ArrayBroadcast is not using it. Is there any other way to implement this so that ArrayBroadcast similar to other visitors also uses the CHECK_FAST_C()?

If it is complicated, then maybe we can support it in a separate PR.

If there is a design decision that you don't know the answer to, create an issue, describe the problem, etc.

And let's merge some solution so that --fast just works. Then we can iterate on it.

Creating a function and assigning a value (calling the function CHECK_FAST_C does it) is slower than the direct assignment. Also, it might be not possible to know the array type as ArrayConstant will always be a FixedSizeArray.

In LLVM also, we don't use the m_value.

Thirumalai-Shaktivel · 2023-11-23T12:19:24Z

Ready!

Thirumalai-Shaktivel · 2023-11-24T02:46:14Z

Thanks for the approvals; if there is any issue, we will fix it iteratively!

Test: Relax the accuracy

08e2633

Thirumalai-Shaktivel commented Nov 22, 2023

View reviewed changes

Test: Update tests

63076d8

certik reviewed Nov 22, 2023

View reviewed changes

certik marked this pull request as draft November 22, 2023 04:47

Thirumalai-Shaktivel force-pushed the simd_05 branch from 436ab45 to fa99a5a Compare November 22, 2023 05:33

Shaikh-Ubaid approved these changes Nov 22, 2023

View reviewed changes

Test: Remove NOFAST for SIMDArray's

07b3226

Shaikh-Ubaid reviewed Nov 22, 2023

View reviewed changes

Thirumalai-Shaktivel force-pushed the simd_05 branch from fa99a5a to 8442bf3 Compare November 22, 2023 05:53

Thirumalai-Shaktivel marked this pull request as ready for review November 22, 2023 06:29

Shaikh-Ubaid reviewed Nov 22, 2023

View reviewed changes

Thirumalai-Shaktivel marked this pull request as draft November 23, 2023 04:07

Thirumalai-Shaktivel added the asr pass Issue or pull request specific to ASR pass label Nov 23, 2023

Thirumalai-Shaktivel added 4 commits November 23, 2023 17:34

[ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray

54984c3

[C] Handle FMA for SIMDArray's

c939303

[LLVM] Handle casting from SIMDArray to DescriptorArray

c06f37c

Refactor: [C] Remove CHECK_FAST_C for ArrayBroadcast

a967489

Thirumalai-Shaktivel force-pushed the simd_05 branch from 8442bf3 to 483dfb8 Compare November 23, 2023 12:08

czgdp1807 approved these changes Nov 23, 2023

View reviewed changes

Thirumalai-Shaktivel force-pushed the simd_05 branch from 483dfb8 to 63076d8 Compare November 24, 2023 02:45

Thirumalai-Shaktivel marked this pull request as ready for review November 24, 2023 02:46

Thirumalai-Shaktivel enabled auto-merge November 24, 2023 02:48

Thirumalai-Shaktivel changed the title ~~[ASR Pass] Skip SIMDArray's in FMA pass~~ [ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray Nov 24, 2023

Thirumalai-Shaktivel merged commit cb439e5 into lfortran:main Nov 24, 2023
20 checks passed

Thirumalai-Shaktivel deleted the simd_05 branch November 24, 2023 03:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray #2891

[ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray #2891

Thirumalai-Shaktivel commented Nov 22, 2023

Thirumalai-Shaktivel Nov 22, 2023 •

edited

certik Nov 22, 2023 •

edited

certik Nov 22, 2023

Thirumalai-Shaktivel Nov 22, 2023

certik commented Nov 22, 2023

Shaikh-Ubaid commented Nov 22, 2023

Shaikh-Ubaid left a comment

Shaikh-Ubaid Nov 22, 2023

Thirumalai-Shaktivel Nov 22, 2023

Shaikh-Ubaid Nov 22, 2023

certik Nov 22, 2023

Thirumalai-Shaktivel Nov 23, 2023

Shaikh-Ubaid Nov 22, 2023

Thirumalai-Shaktivel Nov 22, 2023

Thirumalai-Shaktivel Nov 22, 2023 •

edited

Shaikh-Ubaid Nov 22, 2023

certik Nov 22, 2023

Thirumalai-Shaktivel Nov 23, 2023

Thirumalai-Shaktivel commented Nov 23, 2023

Thirumalai-Shaktivel commented Nov 24, 2023

[ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray #2891

[ASR Pass] FMA: Create an IntrinsicScalarFunction for SIMDArray #2891

Conversation

Thirumalai-Shaktivel commented Nov 22, 2023

Thirumalai-Shaktivel Nov 22, 2023 • edited

Choose a reason for hiding this comment

certik Nov 22, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

certik commented Nov 22, 2023

Shaikh-Ubaid commented Nov 22, 2023

Shaikh-Ubaid left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thirumalai-Shaktivel Nov 22, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thirumalai-Shaktivel commented Nov 23, 2023

Thirumalai-Shaktivel commented Nov 24, 2023

Thirumalai-Shaktivel Nov 22, 2023 •

edited

certik Nov 22, 2023 •

edited

Thirumalai-Shaktivel Nov 22, 2023 •

edited