# PSyclone tutorial: NEMO API Example 4 - OpenACC

This example shows how we can add OpenACC directives to the code using a transformation to make the code run in parallel on GPU accelerators.

As before, let's continue with the code introduced in example 2 and create a schedule from it:

In [None]:
code = '''program test
  implicit none
  integer, parameter :: jpi=10, jpj=10, jpk=10
  real, dimension(jpi,jpj,jpk) :: a,b
  integer :: ji,jj,jk
  call timer_start()
  do jk=1,jpk
    do jj=1,jpj
      do ji=1,jpi
        b(ji,jj,jk) = 0.0
      end do
    end do
  end do
  do jk=1,jpk
    do jj=1,jpj
      do ji=1,jpi
        a(ji,jj,jk) = b(ji,jj,jk)
      end do
    end do
  end do
  call timer_end()
  write (6,*) "HELLO"
end program test'''

In [None]:
from fparser.common.readfortran import FortranStringReader
reader = FortranStringReader(code)
from fparser.two.parser import ParserFactory
parser = ParserFactory().create(std="f2003")
parse_tree = parser(reader)

from psyclone.psyGen import PSyFactory
psy = PSyFactory("nemo").create(parse_tree)

invoke = psy.invokes.invoke_list[0]
schedule = invoke.schedule

schedule.view()

Now that we have created the PSyIR representation of the code we apply two PSyclone OpenACC transformations.

The OpenACC Kernels transformation adds an OpenACC Kernels node around the two computational loops. In this case we explicitly provide the two loops we would like to enclose as a list. The numbers in the schedule view above can be used to see how the numbers in the script link up to the schedule nodes. This directive tells the compiler to try to parallelise this region of code.

The OpenACC Data tranformation adds an OpenACC data region around the newly created Kernels region. This directive tells the compiler which data to copy to and from the accelerator within the specified region. Note, PSyclone works out what data should be copied in and out of this region so the user does not need to work it out.

In [None]:
from psyclone.transformations import ACCDataTrans, ACCKernelsTrans
acc_kern_trans = ACCKernelsTrans()
acc_data_trans = ACCDataTrans()
_, _ = acc_kern_trans.apply(schedule.children[1:3])
_, _ = acc_data_trans.apply([schedule.children[1]])

Taking a look at what has happened to the PSyIR representation you can see that new OpenACC Kernels and Data nodes have been added in the appropriate places:

In [None]:
schedule.view()

If we've finished with our transformations we can write out the resultant code which can now run in parallel using OpenACC. Notice that the dependencies of the array variables in the OpenACC region has been analysed by PSyclone and the appropriate clause added to the OpenACC Data directive:

In [None]:
print(psy.gen)

In the last two examples we have taken the same source code and used PSyclone transformations to transform the code to run on multi-core CPUs (in the previous example) or GPU accelerators (in this example). This approach allows scientists to write the source code once without being concerned with parallel constructs such as OpenMP and OpenACC directives, thereby helping with scientific productivity. It also allows the code to be optimised for different architectures using different PSyclone transformations, thereby supporting performance portability.

Congratulations, you have finished the nemo section of the tutorial.