# Exercise: Convert the GPU chessboard code to managed memory

In the previous exercise we modified the chessboard_CPU code to work on the GPU using manual memory management techniques. In this exercise we will use the new memory management information in Lesson 6 to convert the exercise to use managed memory.

Below is a standard 8x8 chess board:

<figure style="margin: 1em; margin-left:auto; margin-right:auto; width:70%;">
    <img src="../images/Chess_board.svg">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">A chess board of size 8x8.</figcaption>
</figure>

In [2]:
import os
os.environ['PATH'] = f"{os.environ['PATH']}:../../../install/bin"

## The exercise (TLDR version)

In the file [chessboard_mm.f90](chessboard_mm.f90) is the Fortran source, and in the file [kernel_code.cpp](kernel_code.cpp) is the C++ source that contains the `fill_chessboard` kernel. Both source files work and produce a correct result. Your task is to convert the memory allocation steps to use managed memory. The steps required are:

0. Replace calls from `hipMalloc` to `hipMallocManaged` with the right arguments. 
1. Make sure the right memory is being passed to the kernel.
2. Remove the call to hipMemcpy, why?.

## Compile and run the exercise

The code below compiles, installs and runs the `chessboard_GPU` program. Until all the pieces are in place  the code doesn't produce meaningful output.

In [12]:
!build chessboard_mm; run chessboard_mm 

[100%] Built target chessboard_mm
[  2%] Built target tensoradd_simple
[  5%] Built target tensoradd_allocatable
[  7%] Built target tensoradd_pointer
[ 10%] Built target tensoradd_function
[ 15%] Built target tensoradd_module
[ 20%] Built target tensoradd_cfun
[ 26%] Built target tensoradd_hip_cptr
[ 32%] Built target tensoradd_hip_fptr
[ 40%] Built target tensoradd_hip_oo
[ 46%] Built target tensoradd_hip_fptr_managed
[ 49%] Built target chessboard_CPU_answer
[ 51%] Built target chessboard_CPU
[ 56%] Built target chessboard_GPU
[ 62%] Built target chessboard_GPU_answer
[ 67%] Built target chessboard_mm
[ 72%] Built target chessboard_mm_answer
[ 77%] Built target paged_mem
[ 82%] Built target pinned_mem
[ 87%] Built target managed_mem
[ 92%] Built target memcpy_sync
[ 97%] Built target memcpy_async
[100%] Built target memcpy_bench
[36mInstall the project...[0m
-- Install configuration: "DEBUG"
0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0   
1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0   
0.0  1

## Compile and run the answer

In the code [chessboard_answer.f90](chessboard_answer.f90) is a simple solution to the problem. You're welcome to check the code for any help you might need.

In [13]:
!build chessboard_mm_answer; run chessboard_mm_answer

[100%] Built target chessboard_mm_answer
[  2%] Built target tensoradd_simple
[  5%] Built target tensoradd_allocatable
[  7%] Built target tensoradd_pointer
[ 10%] Built target tensoradd_function
[ 15%] Built target tensoradd_module
[ 20%] Built target tensoradd_cfun
[ 26%] Built target tensoradd_hip_cptr
[ 32%] Built target tensoradd_hip_fptr
[ 40%] Built target tensoradd_hip_oo
[ 46%] Built target tensoradd_hip_fptr_managed
[ 49%] Built target chessboard_CPU_answer
[ 51%] Built target chessboard_CPU
[ 56%] Built target chessboard_GPU
[ 62%] Built target chessboard_GPU_answer
[ 67%] Built target chessboard_mm
[ 72%] Built target chessboard_mm_answer
[ 77%] Built target paged_mem
[ 82%] Built target pinned_mem
[ 87%] Built target managed_mem
[ 92%] Built target memcpy_sync
[ 97%] Built target memcpy_async
[100%] Built target memcpy_bench
[36mInstall the project...[0m
-- Install configuration: "DEBUG"
0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0   
1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0   