# Using OpenACC in modular programming

---
**Requirements:**

- [Get started](./Get_started.ipynb)
- [Data management](./Data_management.ipynb)
- [Loop configuration](./Loop_configuration.ipynb)

---

Most modern codes use modular programming to make the readability and maintenance easier.
You will have to deal with it inside your own code and be careful to make all functions accessible where you need.

If you call a function inside a kernel, then you need to tell the compiler to create a version for the GPU.
With OpenACC you have to use the `acc routine` directive for this purpose.

With Fortran you will have to take care of the variables that are declared inside modules and use `acc declare create`.

## `acc routine <max_level_of_parallelism>`

This directive is used to tell the compiler to create a function for the GPU as well as for the CPU.
Since the function is available for the GPU you will be able to call it inside a kernel.

When you use this directive you sign a contract with the compiler (normally no soul selling, but check it twice!) 
and promise that the function will be called inside a section of code for which work sharing at this level is not yet activated.
The clauses available are:

- gang
- worker
- vector
- seq: the function is executed sequentially by one GPU thread

The directive is added before the function definition or declaration:
```fortran
subroutine mean_value(array, array_size)
!$acc routine seq
    ! compute the mean value
end subroutine mean_value
```

### Wrong examples

Since it might be a bit tricky here are some wrong examples with an explanation:
This example is wrong because `acc parallel loop worker` activates work sharing at the _worker_ level of parallelism.
The `acc routine worker` indicates that the function can activate _worker_ and _vector_ level of parallelism and you cannot activate twice the same level.
```fortran
subroutine my_worker_subroutine
    !$acc routine worker
    ...
end subroutine

...
!$acc parallel loop worker
do i=1, sys_size
    call my_worker_subroutine
done
```

For a similar reason this is forbidden:
```fortran
subroutine my_gang_subroutine
    $!acc routine gang
    ...
end subroutine

...
!$acc parallel
!$acc loop gang
do i=1, sys_size
    call my_gang_subroutine
enddo

!$acc end parallel
```

This example is wrong since it breaks the promise you make to the compiler.
A vector routine cannot have loops at the _gang_ and _worker_ levels of parallelism.
```fortran
subroutine my_wrong_subroutine
!$acc routine vector
    ...
    !$acc loop gang worker
    do i=0, sys_size
        // some loop stuff
    enddo
end subroutine
```

## Named `acc routine(name) <max_level_of_parallelism>`

You can declare the `acc routine` directive anywhere a function prototype is allowed within the specification part of the routine.

```fortran
module utils
    use openacc
    interface beautiful
        module procedure beautiful_name
        !$acc routine(beautiful_name) vector
    end interface
    contains
       subroutine beautiful_name(name)
       ! Do something
       end subroutine beautiful_name

end module utils

program another_brick
    use utils
    ...
    !$acc routine(beautiful_name) vector
    ...
    call beautiful(name)
    ! integers are beauty
end program another_brick
```

## Directives inside an `acc routine`

Routines you declare with `acc routine` shall not contain directives to create kernels (_parallel_, _serial_, _kernels_).
You have to consider the content of the function already inside a kernel.

```fortran
subroutine init(array, array_size)
!$acc routine vector
integer, dimension(:), intent(inout) :: array
integer, intent(in)                  :: array_size
integer                              :: i
    !$acc loop
    do i = 1, array_size
        array(i) = i
    enddo
end subroutine init
```

## Exercise

In this exercise, you have to compute the mean value of each row of a matrix.
The value is computed by a function `mean_value` working on one row at a time.
This function can use parallelism.

To have correct results, you will need to make the variable `local_mean` private for each thread.
To achieve this you have to use the _private(vars, ...)_ clause of the `acc loop` directive.

Example stored in: `../../examples/Fortran/Modular_programming_mean_value_exercise.f90`

In [None]:
%%idrrun -a
module calcul
    use iso_fortran_env, only : INT32, REAL64        
    contains
        subroutine rand_init(array,n)
            real   (kind=REAL64), dimension(1,n), intent(inout) :: array
            integer(kind=INT32 ), intent(in)                    :: n
            real   (kind=REAL64)                                :: rand_val
            integer(kind=INT32)                                 :: i

            call srand(12345900)
            do i = 1, n
               call random_number(rand_val)
               array(1,i) = 2.0_real64*(rand_val-0.5_real64)
            enddo

        end subroutine rand_init

        subroutine iterate(array, array_size, cell_size)
            real   (kind=REAL64), dimension(array_size,1), intent(inout) :: array
            integer(kind=INT32 ), intent(in)                             :: array_size, cell_size
            real   (kind=REAL64)                                         :: local_mean
            integer(kind=INT32 )                                         :: i

            do i = cell_size/2, array_size-cell_size/2
                local_mean = mean_value(array(i+1-cell_size/2:i+cell_size/2,1), cell_size)
                if (local_mean .lt. 0.0_real64) then
                    array(i,1) = array(i,1) + 0.1
                else
                    array(i,1) = array(i,1) - 0.1
                endif
            enddo
        end subroutine iterate

        function mean_value(t, n)
            real   (kind=REAL64), dimension(n,1), intent(inout) :: t
            integer(kind=INT32 ), intent(in)                    :: n
            real   (kind=REAL64)                                :: mean_value
            integer(kind=INT32 )                                :: i
            mean_value = 0.0_real64
            do i = 1, n
                mean_value = mean_value + t(i,1)
            enddo
            mean_value = mean_value / dble(n)
        end function mean_value
end module calcul

program modular_programming
    use calcul
    implicit none    
    
    real   (kind=REAL64), dimension(:,:), allocatable :: table
    real   (kind=REAL64), dimension(:)  , allocatable :: mean_values
    integer(kind=INT32 )                              :: nx, ny, cell_size, i

    nx =  500000
    ny =    3000
    allocate(table(nx,ny), mean_values(ny))
    table(:,:) = 0.0_real64
    call rand_init(table(1,:),ny)
    cell_size = 32
    do i = 2, ny   
        call iterate(table(:,i), nx, cell_size)
    enddo

    do i = 1, ny
        mean_values(i) = mean_value(table(:,i), nx)
    enddo

    do i = 1, 10
        write(0,"(a18,i5,a1,f20.8)") "Mean value of row ",i,"=",mean_values(i)
    enddo

    do i = ny-10, ny
        write(0,"(a18,i5,a1,f20.8)") "Mean value of row ",i,"=",mean_values(i)
    enddo    
    
    deallocate(table, mean_values)
end program modular_programming

### Solution

Example stored in: `../../examples/Fortran/Modular_programming_mean_value_solution.f90`

In [None]:
%%idrrun -a
module calcul
    use iso_fortran_env, only : INT32, REAL64
    use openacc    
    contains
        subroutine rand_init(array,n)
            real   (kind=REAL64), dimension(1,n), intent(inout) :: array
            integer(kind=INT32 ), intent(in)                    :: n
            real   (kind=REAL64)                                :: rand_val
            integer(kind=INT32)                                 :: i

            call srand(12345900)
            do i = 1, n
               call random_number(rand_val)
               array(1,i) = 2.0_real64*(rand_val-0.5_real64)
            enddo
        end subroutine rand_init

        subroutine iterate(array, array_size, cell_size)
        !$acc routine worker
            real   (kind=REAL64), dimension(1:array_size,1), intent(inout) :: array
            integer(kind=INT32 ), intent(in)                               :: array_size, cell_size
            real   (kind=REAL64)                                           :: local_mean
            integer(kind=INT32 )                                           :: i
 
            !$acc loop seq
            do i = cell_size/2, array_size-cell_size/2
                local_mean = mean_value(array(i+1-cell_size/2:i+cell_size/2,1), cell_size)
                if (local_mean .lt. 0.0_real64) then
                    array(i,1) = array(i,1) + 0.1
                else
                    array(i,1) = array(i,1) - 0.1
                endif
            enddo
        end subroutine iterate

        function mean_value(t, n)
        !$acc routine vector
            real   (kind=REAL64), dimension(n,1), intent(inout) :: t
            integer(kind=INT32 ), intent(in)                    :: n
            real   (kind=REAL64)                                :: mean_value
            integer(kind=INT32 )                                :: i
            mean_value = 0.0_real64
            !$acc loop reduction(+:mean_value)
            do i = 1, n
                mean_value = mean_value + t(i,1)
            enddo
            mean_value = mean_value / dble(n)
        end function mean_value
end module calcul
program modular_programming
    use calcul
    implicit none    
    
    real   (kind=REAL64), dimension(:,:), allocatable :: table
    real   (kind=REAL64), dimension(:)  , allocatable :: mean_values
    integer(kind=INT32 )                              :: nx, ny, cell_size, i

    nx =   10000
    ny =    3000
    allocate(table(nx,ny), mean_values(ny))
    table(:,:) = 0.0_real64
    call rand_init(table(1,:),ny)
    !$acc enter data copyin(table(:,:))
    cell_size = 32
    !$acc parallel loop
    do i = 2, ny   
        call iterate(table(:,i), nx, cell_size)
    enddo

    !$acc parallel loop gang present(table(:,:)) copyout(mean_values(:))
    do i = 1, ny
        mean_values(i) = mean_value(table(:,i), nx)
    enddo

    do i = 1, 10
        write(0,"(a18,i5,a1,f20.8)") "Mean value of row ",i,"=",mean_values(i)
    enddo

    do i = ny-10, ny
        write(0,"(a18,i5,a1,f20.8)") "Mean value of row ",i,"=",mean_values(i)
    enddo    
    
    !$acc exit data delete(table)
    deallocate(table, mean_values)
end program modular_programming