# Atomic operations

---
**Requirements:**

- [Get started](./Get_started.ipynb)
- [Data management](./Data_management.ipynb)

---

The `acc atomic` is kind of a generalization of the concept of reduction that we saw in [Get started](../Get_started.ipynb).
However the mechanism is different and less efficient than the one used for reductions.
So if you have the choice, use a _reduction_ clause.

The idea is to make sure that only one thread at a time can perform a read and/or write operation on a **shared** variable.

The syntax of the directive depends on the clause you use.

## Syntax

### _read_, _write_, _update_
```fortran
!$acc atomic <clause>
 ! One atomic operation
!$acc end atomic  ! This statement is optional
```


The clauses _read_, _write_ and _update_ only apply to the line immediately below the directive.

### _capture_

The _capture_ clause can work on a block of code:
```fortran
!$acc atomic capture <clause>
 ! Set of atomic operations
!$acc end atomic  ! This statement is optional
```

## Restrictions

The complete list of restrictions is available in the OpenACC specification.

We need the following information to understand the restrictions for each clause:

- **v** and **x** are scalar values
- _binop_: binary operator (for example: +, -, \*, /, ++, --, etc)
- _expr_ is an expression that reduces to a scalar and must have precedence over _binop_

### _read_

The expression must be of the form:

```fortran
!$acc atomic read
v = x
!$acc end atomic   ! This statement is optional
```

### _write_

The expression must have the form:

```fortran
!$acc atomic write
x = expr
!$acc end atomic  ! This statement is optional
```

### _update_

Several forms are available:

```fortran
!$acc atomic 
x = x + (3*10)

!$acc atomic 
x = max(x, 3.0, -1.0, 2.0/5.0)  ! The update clause is optional

!$acc atomic update
x = x + (3*10) ! The end atomic statement is optionnal

!$acc atomic
x = x + (3*10)
!$acc end atomic
```

### _capture_

A capture is an operation where you set a variable with the value of an updated variable:
```fortran
! x = x operator expr ( update statement)
! v = x               (capture statement)
!$acc atomic capture
x = x + (3*10)
v = x
!$acc end atomic

! x = intinsic_procedure(x,scalar_expr_list) ( update statement)
! v = x                                      (capture statement)
!$acc atomic capture
x = max(x, 3.0, -1.0, 2.0/5.0)
v = x
!$acc end atomic


! v = x               (capture statement)
! x = x operator expr ( update statement)
!$acc atomic capture
v = x
x = x + (3*10)
!$acc end atomic

! v = x                                      (capture statement)
! x = intinsic_procedure(x,scalar_expr_list) ( update statement)
!$acc atomic capture
v = x 
x = max(3.0, -1.0, 2.0/5.0, x)
!$acc end atomic


! v = x    (capture statement)
! x = expr (  write statement)
!$acc atomic capture
v = x
x = 3*10
!$acc end atomic
```

## Exercise

Let's check if the default random number generator provided by the standard library gives good results.

In the example we generate an array of integers randomly set from 0 to 9.
The purpose is to check if we have a uniform distribution.

We cannot perform the initialization on the GPU since the rand() function is not OpenACC aware.

You have to:

- Create a kernel for the integer counting
- Make sure that the results are correct (you should have around 10% for each number)

Example stored in: `../../examples/Fortran/atomic_exercise.f90`

In [None]:
%%idrrun -a
program histogram
    use iso_fortran_env, only : REAL64, INT32
    implicit none

    integer(kind=INT32 ), dimension(:) , allocatable :: shots
    integer(kind=INT32 ), dimension(10)              :: histo
    integer(kind=INT32 ), parameter                  :: nshots = 1e9
    real   (kind=REAL64)                             :: random_real
    integer(kind=INT32 )                             :: i

    ! Histogram allocation and initialization
    do i = 1, 10
     histo(i) = 0
    enddo

    ! Allocate memory for the random numbers
    allocate(shots(nshots))

    ! Fill the array on the CPU (rand is not available on GPU with Nvidia Compilers)
    do i = 1, nshots
        call random_number(random_real)
        shots(i) = floor(random_real * 10.0_real64) + 1
    enddo

    ! Count the number of time each number was drawn
    do i = 1, nshots
        histo(shots(i)) = histo(shots(i)) + 1
    enddo

    !  Print results
    do i = 1, 10
        write(0,"(i2,a2,i10,a2,f10.8,a1)") i,": ", histo(i), " (", real(histo(i))/1.e9, ")"
    enddo

    deallocate(shots)

end program histogram

### Solution

Example stored in: `../../examples/Fortran/atomic_solution.f90`

In [None]:
%%idrrun -a
program histogram
    use iso_fortran_env, only : REAL64, INT32
    use openacc
    implicit none

    integer(kind=INT32 ), dimension(:) , allocatable :: shots
    integer(kind=INT32 ), dimension(10)              :: histo
    integer(kind=INT32 ), parameter                  :: nshots = 1e9
    real   (kind=REAL64)                             :: random_real
    integer(kind=INT32 )                             :: i

    ! Histogram allocation and initialization
    do i = 1, 10
     histo(i) = 0
    enddo

    ! Allocate memory for the random numbers
    allocate(shots(nshots))

    ! Fill the array on the CPU (rand is not available on GPU with Nvidia Compilers)
    do i = 1, nshots
        call random_number(random_real)
        shots(i) = floor(random_real * 10.0_real64) + 1
    enddo

    ! Count the number of time each number was drawn
    !$acc parallel loop copyin(shots(:)) copyout(histo(:))
    do i = 1, nshots
        !$acc atomic 
        histo(shots(i)) = histo(shots(i)) + 1
    enddo

    !  Print results
    do i = 1, 10
        write(0,"(i2,a2,i10,a2,f10.8,a1)") i,": ", histo(i), " (", real(histo(i))/1.e9, ")"
    enddo

    deallocate(shots)

end program histogram

#### Important Note

With recent NVidia compilers you can use reduction on tables. It will be more efficient than using atomic operations.