-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[NFC][flang][do concurent] Add saxpy offload tests for OpenMP mapping #155993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-offload Author: Kareem Ergawy (ergawy) ChangesAdds end-to-end tests for Full diff: https://github.com/llvm/llvm-project/pull/155993.diff 2 Files Affected:
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90 b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
new file mode 100644
index 0000000000000..c6f576acb90b6
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | %fcheck-generic
+module saxpymod
+ use iso_fortran_env
+ public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n, m)
+ use iso_fortran_env
+ implicit none
+ integer,intent(in) :: n, m
+ real(kind=real32),intent(in) :: a
+ real(kind=real32), dimension(:,:),intent(in) :: x
+ real(kind=real32), dimension(:,:),intent(inout) :: y
+ integer :: i, j
+
+ do concurrent(i=1:n, j=1:m)
+ y(i,j) = a * x(i,j) + y(i,j)
+ end do
+
+ write(*,*) "plausibility check:"
+ write(*,'("y(1,1) ",f8.6)') y(1,1)
+ write(*,'("y(n,m) ",f8.6)') y(n,m)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+ use iso_fortran_env
+ use saxpymod, ONLY:saxpy
+ implicit none
+
+ integer,parameter :: n = 1000, m=10000
+ real(kind=real32), allocatable, dimension(:,:) :: x, y
+ real(kind=real32) :: a
+ integer :: i
+
+ allocate(x(1:n,1:m), y(1:n,1:m))
+ a = 2.0_real32
+ x(:,:) = 1.0_real32
+ y(:,:) = 2.0_real32
+
+ call saxpy(a, x, y, n, m)
+
+ deallocate(x,y)
+end program main
+
+! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK: plausibility check:
+! CHECK: y(1,1) 4.0
+! CHECK: y(n,m) 4.0
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90 b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
new file mode 100644
index 0000000000000..e094a1d7459ef
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | %fcheck-generic
+module saxpymod
+ use iso_fortran_env
+ public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n)
+ use iso_fortran_env
+ implicit none
+ integer,intent(in) :: n
+ real(kind=real32),intent(in) :: a
+ real(kind=real32), dimension(:),intent(in) :: x
+ real(kind=real32), dimension(:),intent(inout) :: y
+ integer :: i
+
+ do concurrent(i=1:n)
+ y(i) = a * x(i) + y(i)
+ end do
+
+ write(*,*) "plausibility check:"
+ write(*,'("y(1) ",f8.6)') y(1)
+ write(*,'("y(n) ",f8.6)') y(n)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+ use iso_fortran_env
+ use saxpymod, ONLY:saxpy
+ implicit none
+
+ integer,parameter :: n = 10000000
+ real(kind=real32), allocatable, dimension(:) :: x, y
+ real(kind=real32) :: a
+ integer :: i
+
+ allocate(x(1:n), y(1:n))
+ a = 2.0_real32
+ x(:) = 1.0_real32
+ y(:) = 2.0_real32
+
+ call saxpy(a, x, y, n)
+
+ deallocate(x,y)
+end program main
+
+! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK: plausibility check:
+! CHECK: y(1) 4.0
+! CHECK: y(n) 4.0
|
02636ca
to
3dd383b
Compare
b201f91
to
f1bbd24
Compare
3dd383b
to
f2e47d9
Compare
f1bbd24
to
c50d3e6
Compare
f2e47d9
to
77181e6
Compare
c50d3e6
to
2fd2022
Compare
Ping! Please have a look when you have time. |
77181e6
to
bd8fab0
Compare
2fd2022
to
d967c72
Compare
bd8fab0
to
f19a301
Compare
d967c72
to
d592609
Compare
…ide values (#155754) Following up on #154483, this PR introduces further refactoring to extract some shared utils between OpenMP lowering and `do concurrent` conversion pass. In particular, this PR extracts 2 utils that handle mapping or cloning values used inside target regions but defined outside. Later `do concurrent` PR(s) will also use these utils. PR stack: - #155754◀️ - #155987 - #155992 - #155993 - #156589 - #156610 - #156837
f19a301
to
db09d54
Compare
d592609
to
315f521
Compare
… clone outside values (#155754) Following up on #154483, this PR introduces further refactoring to extract some shared utils between OpenMP lowering and `do concurrent` conversion pass. In particular, this PR extracts 2 utils that handle mapping or cloning values used inside target regions but defined outside. Later `do concurrent` PR(s) will also use these utils. PR stack: - llvm/llvm-project#155754◀️ - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#156589 - llvm/llvm-project#156610 - llvm/llvm-project#156837
db09d54
to
6d564c6
Compare
315f521
to
e681a9f
Compare
e36db59
to
2177ccc
Compare
… tests (#155992) Adds more lit tests for `do concurrent` device mapping. PR stack: - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992◀️ - llvm/llvm-project#155993 - llvm/llvm-project#157638 - llvm/llvm-project#156610 - llvm/llvm-project#156837
Adds end-to-end tests for `do concurrent` offloading to the device.
2177ccc
to
fd66849
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, LGTM.
…nMP mapping (#155993) Adds end-to-end tests for `do concurrent` offloading to the device. PR stack: - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993◀️ - llvm/llvm-project#157638 - llvm/llvm-project#156610 - llvm/llvm-project#156837
…m#155992) Adds more lit tests for `do concurrent` device mapping. PR stack: - llvm#155754 - llvm#155987 - llvm#155992◀️ - llvm#155993 - llvm#157638 - llvm#156610 - llvm#156837
…llvm#155993) Adds end-to-end tests for `do concurrent` offloading to the device. PR stack: - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993◀️ - llvm#157638 - llvm#156610 - llvm#156837
…m#155992) Adds more lit tests for `do concurrent` device mapping. PR stack: - llvm#155754 - llvm#155987 - llvm#155992◀️ - llvm#155993 - llvm#157638 - llvm#156610 - llvm#156837
…llvm#155993) Adds end-to-end tests for `do concurrent` offloading to the device. PR stack: - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993◀️ - llvm#157638 - llvm#156610 - llvm#156837
Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. - #155754 - #155987 - #155992 - #155993 - #157638◀️ - #156610 - #156837
… (#157638) Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#157638◀️ - llvm/llvm-project#156610 - llvm/llvm-project#156837
Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. - #155754 - #155987 - #155992 - #155993 - #157638 - #156610◀️ - #156837
…e (#156610) Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#157638 - llvm/llvm-project#156610◀️ - llvm/llvm-project#156837
…ions on the GPU (#156837) Fixes a bug related to insertion points when inlining multi-block combiner reduction regions. The IP at the end of the inlined region was not used resulting in emitting BBs with multiple terminators. PR stack: - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#157638 - llvm/llvm-project#156610 - llvm/llvm-project#156837◀️
Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993 - llvm#157638◀️ - llvm#156610 - llvm#156837
) Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993 - llvm#157638 - llvm#156610◀️ - llvm#156837
… GPU (llvm#156837) Fixes a bug related to insertion points when inlining multi-block combiner reduction regions. The IP at the end of the inlined region was not used resulting in emitting BBs with multiple terminators. PR stack: - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993 - llvm#157638 - llvm#156610 - llvm#156837◀️
…m#155992) Adds more lit tests for `do concurrent` device mapping. PR stack: - llvm#155754 - llvm#155987 - llvm#155992◀️ - llvm#155993 - llvm#157638 - llvm#156610 - llvm#156837
…llvm#155993) Adds end-to-end tests for `do concurrent` offloading to the device. PR stack: - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993◀️ - llvm#157638 - llvm#156610 - llvm#156837
Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993 - llvm#157638◀️ - llvm#156610 - llvm#156837
) Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993 - llvm#157638 - llvm#156610◀️ - llvm#156837
Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993 - llvm#157638◀️ - llvm#156610 - llvm#156837
) Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993 - llvm#157638 - llvm#156610◀️ - llvm#156837
… GPU (llvm#156837) Fixes a bug related to insertion points when inlining multi-block combiner reduction regions. The IP at the end of the inlined region was not used resulting in emitting BBs with multiple terminators. PR stack: - llvm#155754 - llvm#155987 - llvm#155992 - llvm#155993 - llvm#157638 - llvm#156610 - llvm#156837◀️
Adds end-to-end tests for
do concurrent
offloading to the device.PR stack:
do concurrent
mapping to device #155987do concurrent
to device mapping lit tests #155992do concurrent
: supportlocal
on device #157638do concurrent
: supportreduce
on device #156610