Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C] Handle BinOp for SIMDArray #2836

Merged
merged 9 commits into from
Nov 16, 2023

Conversation

Thirumalai-Shaktivel
Copy link
Member

Towards #2293

@certik
Copy link
Contributor

certik commented Nov 10, 2023

Is this ready for review?

Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. I don't know if this is ready for review.

@Thirumalai-Shaktivel
Copy link
Member Author

Yup, this is ready!
The remaining issue in matmul_01 is x = a + b * c(1:4)

@Thirumalai-Shaktivel
Copy link
Member Author

From Zulip:
For a = c(1:2), we actually want to also use the vector extensions in C, not a loop.

I'm marking this as a draft for now. For implementing SIMD assignment using vector extensions.

@Thirumalai-Shaktivel
Copy link
Member Author

Thirumalai-Shaktivel commented Nov 14, 2023

I made some research and came across the following:
Scalar intialization or Broadcast

[...]
!LF$ attributes simd :: A
real :: A(4)
A = 23.
! or
A = i ! value: `23.`
[...]

We can do:

// C code
[...]
A = (float __attribute__ (( vector_size(sizeof(float) * 4) ))){23., 23., 23., 23.}
// or 
A = (float __attribute__ (( vector_size(sizeof(float) * 4) ))){i, i, i, i}
// or 
A = i - (float __attribute__ (( vector_size(sizeof(float) * 4) ))){ }
// See for more details: https://stackoverflow.com/a/43801280/15913193
[...]

I thought of doing the third one.

Array initialisation:

[...]
!LF$ attributes simd :: A
real :: A(4), C(8)
C = 3.
A = C(:4)
[...]

We can do

// C code
[...]
float a __attribute__ (( vector_size(sizeof(float) * 4) ));
struct r32 c_value;
struct r32* c = &c_value;
float c_data[8];
c->data = c_data;
c->n_dims = 1;
c->dims[0].lower_bound = 1;
c->dims[0].length = 8;
memcpy(&a, c->data, sizeof(float) * 4);
[...]

@certik do yo know any other builtin functions that we can use here?

@Thirumalai-Shaktivel
Copy link
Member Author

Thirumalai-Shaktivel commented Nov 14, 2023

Also, should we predefine the types as the following in lfortran_intrinsics.h?

typedef float   v8float  __attribute__ ((vector_size (32)));   /* float[8],  AVX  */
typedef double  v4double  __attribute__ ((vector_size (32)));  /* double[4], AVX  */
typedef float   v4float  __attribute__ ((vector_size (16)));   /* float[4],  SSE  */

See: https://www.linuxquestions.org/questions/programming-9/how-do-you-use-sse-2-3-in-c-c-code-884780-print/

@certik
Copy link
Contributor

certik commented Nov 14, 2023

do yo know any other builtin functions that we can use here?

Do you mean in our C++ code? Not sure. '

I would not predefine v4float, since the lengths will differ on each architecture.

Comment on lines 33 to 35
for (__1_t=0; __1_t<=7; __1_t++) {
a[__1_t] = (float)(1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the intention? I am confused, the older version feels correct. This loop is just normal C code, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I think we should ideally avoid accessing individual vector elements and leave it upto the processor to handle vector operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, Yea, I thought the same. I'm working on it and will push the changes soon.

@Thirumalai-Shaktivel
Copy link
Member Author

Current design:
Fortran example

!LF$ attributes simd :: A
real :: A(4), C(8)
real :: i = 12.
A = i

C Backend

a = (float __attribute__ (( vector_size(sizeof(float) * 4) ))) {i, i, i, i};

Fortran example

A = 1.2

C Backend

a = (float __attribute__ (( vector_size(sizeof(float) * 4) ))) {  1.19999999999999996e+00,   1.19999999999999996e+00,   1.19999999999999996e+00,   1.19999999999999996e+00};

Fortran example

C = 42
A = C

C Backend

memcpy(&a, c->data, sizeof(float) * 4);

Fortran example

A = C(2:)

C Backend

memcpy(&a, c->data + (2 - c->dims[0].lower_bound), sizeof(float) * 4);

@Thirumalai-Shaktivel
Copy link
Member Author

Ready for review!

Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is fine. However, I think only the simd version is being tested, not the non-simd version, correct? I think we need to update our cmake tester to test both versions.

}
array_const_str += "}";
src = "(" + cast + ") " + array_const_str;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do cast here? This seems something like hard coded, where as the previous one where we actually insert the cast while visiting the visit_ArrayPhysicalCast() is more general I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ArrayBroadcast should only be used for SIMDArray, otherwise throw error.
And for SIMDArray we always need a cast, so I moved it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a case, where this operation has to be performed in visit_ArrayPhysicalCast. If it requires we will use this there as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ArrayBroadcast should only be used for SIMDArray

I think it can be used for regular arrays as well. At the moment, we only support array broadcast for SIMD arrays at the C backend level. I would keep the implementation generalised as before instead of hardcoding.

And for SIMDArray we always need a cast, so I moved it here.

I think we can't say that a cast would always be needed. Consider if we are broadcasting an SIMD array. Then I think a cast is not needed here.

I think the backend should just be completely "dumb" and just follow what the ASR says. If the ASR has a cast node, then the backend adds the cast operation. If there is no cast node in the ASR, then there should be no cast generated by the backend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be used for regular arrays as well.

Nope, the ArrayBroadcast design is that we handle all the ArrayBroadcast in the array_op itself except for the SIMDArray case, i.e., we shouldn't be visiting ArrayBroadcast in the backends except for SIMDArray

Regarding the Cast, I had some problem with adding the cast in visit_ArrayPhysicalCast so I moved them here, I will look into it and report back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider if we are broadcasting an SIMD array. Then I think a cast is not needed here.

I think SIMDArray would have an assignment here:

(=
    (Var 4 a)
    (Var 4 b)
    ()
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't be visiting ArrayBroadcast in the backends except for SIMDArray

Nope, the ArrayBroadcast design is that we handle all the ArrayBroadcast in the array_op itself except for the SIMDArray case, i.e., we shouldn't be visiting ArrayBroadcast in the backends except for SIMDArray

Yes, I know the array_op pass handles the ArrayBroadcast. But think it from the general perspective. In general, I think the ArrayBroadCast is meant for all types of arrays. If you plan to support only simd for now or in future, I would add an LCompilersAssert(is_simd_array()) (or anything similar) so that it indicates that only simd arrays are supported and helps us catch unexpected bugs in ArrayBroadcast (For example when some other type of array gets passed to ArrayBroadcast.)

I think SIMDArray would have an assignment here:

Consider a with length 128 and b with length 256 and we have b = a. I think here we need to broadcast a so that it matches the length of b.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we wouldn't be able to add a check LCompilersAssert(is_simd_array()) in ArrayBroadcast, as ArrayBroadcast type will always be a FixedArraySize. Maybe it might be possible in visit_Assignment or visit_ArrayPhysicalCast.

Consider a with length 128 and b with length 256 and we have b = a. I think here we need to broadcast a so that it matches the length of b.

We shouldn't allow the assignment of different size array, right?
GFortran throws an error for it:

$ gfortran examples/expr2.f90  && ./a.out
examples/expr2.f90:6:0:

    6 | y = x
      | 
Error: Different shape for array assignment at (1) on dimension 1 (256 and 128)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fine for now. We can also add the assert later as the design evolves.

Comment on lines +1351 to +1349
} else if (ASRUtils::is_simd_array(x.m_v)) {
index += src;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ideally not implement array item for SIMD array until utmost necessary. I think we should avoid using it as much as possible, ideally never use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I used this for the print statement, just for debugging. If this is not required, I can remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we introduce an option, like --print-simd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @certik

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we introduce an option, like --print-simd?

I do not have any opinion on this. I would just do best to avoid as many if(is_simd_array()), then, else as possible, so that the code is still clean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would should "--print-simd" do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was, Should we print SIMDArray?
print_arr pass converts the SIMDArray to a do_loop to print the values, should we allow it?

print *, a

--show-fortran

do __1_k = lbound(a, 1), ubound(a, 1)
    print *, a(__1_k)
end do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf65b41 and 53de75b changes were introduced for the above do loop.

Comment on lines -1285 to +1279
src = "((" + result_type + ")" + var_name + "->dims[" + idx + "-1].lower_bound)";
if (ASRUtils::is_simd_array(x.m_v)) {
src = "0";
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lower_bound for an SIMD array in not meaningful. I am unsure, but I think this case should never be triggered for SIMD arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, used for printing the output

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no opinion on this. Mostly likely it would not get triggered, so seems like a dead code to me at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I thought of removing them for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this for further implementation of SIMDArray, as it would be helpful for debugging.
I will create an issue to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please post a link of the issue here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shaikh-Ubaid
Copy link
Member

Thanks for the contributions, Thirumalai. I shared some comments above. Overall, it looks good to me.

@Shaikh-Ubaid
Copy link
Member

Looking at the changes in the backend, it seems (and as we expected) there are some/several if (simd_array()), then, else being needed. This seems to point in the direction of a separate type for Vector Array (https://github.com/lcompilers/lpython/wiki/Design-of-Vector-Arrays-in-ASR).

I think as we dive more, the design might get clearer.

@certik
Copy link
Contributor

certik commented Nov 15, 2023

What needs to be done to finish this?

res = A + B
C(:4) = res
res = A * B
C(5:) = res
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a print *, C here (before the if assert) so that it is helpful for debugging later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useless for the CI, so let's not do it. The developer can always add a print *, C and test it, but not git stage it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useless for the CI, so let's not do it. The developer can always add a print *, C and test it, but not git stage it.

I think the tests are not just for the CI but also to help developers debug an issue (when any arises). I think there is no disadvantage in printing value on the console, but I think there are advantages:

  • Helps developers debug the test (without making changes to it)
  • Most important: Ensures that the test actually runs and produces values. The produced values can be seen by the developers on the console. Previously I have experienced that we had some tests which have asserts (and no prints) and these asserts never got run or get triggered because the function that would test them got removed by the unused_functions pass. So, the test would pass (because there is no checking being done) and the developer gets the false impression that the test works correctly. I think a print in this would have been very helpful in avoiding such situation. Since the developer would have known in the first place when adding such test as he would have noticed no value being printed on the console (because even the print would be removed as the function itself is removed).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know it would print an output on the console on failure. In that case, I think we can add a print statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

} else {
value += "->data";
}
} else if (ASR::is_a<ASR::Var_t>(*x.m_target)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this instead be ?

Suggested change
} else if (ASR::is_a<ASR::Var_t>(*x.m_target)) {
} else if (ASR::is_a<ASR::Var_t>(*x.m_value)) {

Also, based on the if-else conditions, I think the above case is unused currently (or is like a dead code)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Could you specify what part was updated? I can see no change in the above if statement. Did you push your changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, it seems updated. Thanks.

@Shaikh-Ubaid
Copy link
Member

I think reference tests need to be updated.

@Shaikh-Ubaid
Copy link
Member

@Thirumalai-Shaktivel Could you also check if the above changes work with LPython by submitting a PR?

@Thirumalai-Shaktivel
Copy link
Member Author

@Shaikh-Ubaid, you can do a final review now.

Yup, I was planning to do the same, after the review.

Copy link
Member

@Shaikh-Ubaid Shaikh-Ubaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good to me. Thanks for this.

If these changes work with LPython, I think it is good to merge.

@Thirumalai-Shaktivel
Copy link
Member Author

LPython: lcompilers/lpython#2425

@Thirumalai-Shaktivel
Copy link
Member Author

Thanks for the review!

@Thirumalai-Shaktivel Thirumalai-Shaktivel marked this pull request as ready for review November 16, 2023 09:07
@Thirumalai-Shaktivel Thirumalai-Shaktivel merged commit 92e744a into lfortran:main Nov 16, 2023
20 checks passed
@Shaikh-Ubaid
Copy link
Member

Shaikh-Ubaid commented Nov 16, 2023

It seems this PR is merged, but the related PR is still open (I think it might be waiting for approval). Please merge the PR and its related PRs together or at similar times (ideally when there is approval on both/all the PRs). This will help keep the libasr intact and allow to contribute libasr changes to the two projects fluently. Thanks for the contributions. I appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants