# Exercise: Fill a chess board with a GPU!

In order to solidify the topics learned in this module it is helpful to fill in the missing components of a Hipfort program. Below is a standard 8x8 chess board:

<figure style="margin: 1em; margin-left:auto; margin-right:auto; width:70%;">
    <img src="../images/Chess_board.svg">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">A chess board of size 8x8.</figcaption>
</figure>

You may have already completed the CPU version of the chessboard exercise. In this exercise the goal is to implement the necessary steps to fill the chessboard using a HIP kernel. 

## The exercise (TLDR version)

In the file [chessboard_GPU.f90](chessboard_GPU.f90) is the Fortran source, and in the file [kernel_code.cpp](kernel_code.cpp) is the C++ source that contains the `fill_chessboard` kernel. Both source files have the basics already filled in. Your task is to insert the required Hipfort machinery to make all the pieces work. The steps required are:

0. Finish implementing the `fill_chessboard` kernel in [kernel_code.cpp](kernel_code.cpp).
1. Initialize the GPU in [chessboard_GPU.f90](chessboard_GPU.f90)
2. Allocate memory for the chessboard on the compute device at Fortran pointer `B_d`.
3. Launch the `fill_chessboard` kernel.
4. Copy memory from `B_d` on the compute device to `B_h` on the host.
5. Release memory for Fortran pointer `B_d`.
6. Reset the compute device.

## Choose your own adventure!

Each task may be skipped by uncommenting the `include` statement for the shortcut solution. For example, in the file [kernel_code.cpp](kernel_code.cpp), the shortcut solution may be `included` by uncommenting the following line of text:

```C++
    // Uncomment this for the shortcut solution to Step 0.
    //#include "step0_kernel.h"
```

In this way you can choose which parts of the exercise you want to complete. Wether it is one part or all parts. The choice is yours! If you are going to use a shortcut though, try to understand what the code in the `.h` file is doing.

## The exercise (step by step version)

### Step 0

Fill out the missing pieces of the `fill_chessboard` kernel in [kernel_code.cpp](kernel_code.cpp). You will need to: 

* Ensure a guard clause is in place to prevent the GPU running off the end of the array if the grid is larger than 8x8.
* Use multidimensional indexing to compute an offset into `B` at coordinates (i0, i1).
* Something that might help with the math is to use modo arithmetic. If we define `k` as an integer such that

```C++
int k = i0 + i1 % 2;
```
and `light` and `dark` contain floating point values for light and dark cells, then we can use this formula to compute the value inside a chessboard

```C++
float_type scratch = ((k+1)%2)*light + (k%2)*dark;
```

### Step 1

It is your job to work out the correct one-dimensional index into array B.

**Step 2.**. Initialise the GPU
**Step 3.**. Allocate memory for Fortran pointer `B_d` using `hipmalloc`.



. The code is half-complete and needs additional code in order to work properly. If you click on the link above it will open the file in the Jupyter notebook editor.

```Fortran
program chessboard
    !! Program to fill a chessboard with values and print the result

    ! Add this to use the standard Fortran environment module
    use iso_fortran_env
    use iso_c_binding

    ! Add this to make sure that all variables must be declared
    ! and the compiler performs no type inferencing based on the 
    ! on the first letter of variable names
    implicit none

    ! Number of elements in the tensors
    integer, parameter :: M=8, N=8

    ! Matrix indices
    integer :: i, j

    ! Declare a chessboard here

    ! Fill the chessboard

    ! Print out the array
    do i=1,N
        do j=1,M
            ! Print values in the chessboard
            !write(*, '(F3.1XX) ', advance="no") A(i,j)
        end do
        ! Print a new line
        print *, ""
    end do

    ! Deallocate any allocated memory
    
end program chessboard
```

Using any of the concepts taught in this module, your tasks are:

* Declare a two-dimensional array **A**
* Fill the array with values corresponding to light and dark cells.
* Cleanup any declared arrays

It is up to you to decide the data type for **A** and what values to choose for light and dark squares. The intrinsic math function `mod(a,b)` may be of use. It computes the integer remainder from the division of `a` and `b`.

## Compile and run the exercise

The code below compiles, installs and runs the `chessboard` program. There is some code to print the values in **A**, and the code compiles and runs but doesn't produce any output. Edit the file [chessboard.f90](chessboard.f90) and run the cell below to compile and run the exercise.

In [3]:
!source ../../env; ../../install.sh; chessboard_GPU

-- hip::amdhip64 is SHARED_LIBRARY
-- Configuring done
-- Generating done
-- Build files have been written to: /home/toby/Pelagos/Projects/Hipfort_Course/build
[35m[1mScanning dependencies of target memcpy_bench[0m
[  4%] Built target memcpy_bench
[35m[1mScanning dependencies of target tensoradd_simple[0m
[  8%] Built target tensoradd_simple
[35m[1mScanning dependencies of target tensoradd_allocatable[0m
[ 13%] Built target tensoradd_allocatable
[35m[1mScanning dependencies of target tensoradd_pointer[0m
[ 17%] Built target tensoradd_pointer
[35m[1mScanning dependencies of target tensoradd_function[0m
[ 21%] Built target tensoradd_function
[35m[1mScanning dependencies of target tensoradd_module[0m
[35m[1mConsolidate compiler generated dependencies of target tensoradd_module[0m
[ 30%] Built target tensoradd_module
[35m[1mScanning dependencies of target tensoradd_cfun[0m
[35m[1mConsolidate compiler generated dependencies of target tensoradd_cfun[0m
[ 39%] Built

## Compile and run the answer

In the code [chessboard_answer.f90](chessboard_answer.f90) is a simple solution to the problem. You're welcome to check the code for any help you might need.

In [5]:
!source ../../env; ../../install.sh; chessboard_GPU_answer

-- hip::amdhip64 is SHARED_LIBRARY
-- Configuring done
-- Generating done
-- Build files have been written to: /home/toby/Pelagos/Projects/Hipfort_Course/build
[35m[1mScanning dependencies of target memcpy_bench[0m
[  4%] Built target memcpy_bench
[35m[1mScanning dependencies of target tensoradd_simple[0m
[  8%] Built target tensoradd_simple
[35m[1mScanning dependencies of target tensoradd_allocatable[0m
[ 13%] Built target tensoradd_allocatable
[35m[1mScanning dependencies of target tensoradd_pointer[0m
[ 17%] Built target tensoradd_pointer
[35m[1mScanning dependencies of target tensoradd_function[0m
[ 21%] Built target tensoradd_function
[35m[1mScanning dependencies of target tensoradd_module[0m
[35m[1mConsolidate compiler generated dependencies of target tensoradd_module[0m
[ 30%] Built target tensoradd_module
[35m[1mScanning dependencies of target tensoradd_cfun[0m
[35m[1mConsolidate compiler generated dependencies of target tensoradd_cfun[0m
[ 39%] Built