# Exercise - Using rectangular copies for Hadamard matrix multiplication

Hadamard matrix multiplication is where the values in matrices **D** and **E** at coordinates (i0,i1) are multiplied together to set the value at coordinates (i0,i1) in matrix **F**.

<figure style="margin-left:auto; margin-right:auto; width:80%;">
    <img style="vertical-align:middle" src="../images/elementwise_multiplication.svg">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">Elementwise multiplication of matrices D and E to get F.</figcaption>
</figure>

The steps are: 

1. Device discovery and selection.
1. Command queues created.
1. Matrices **D_h** and **E_h** allocated on the host and filled with random numbers.
1. Matrices **D_d** and **E_d** allocated on the compute device.
1. Programs built, kernels created and kernel arguments selected.
1. Matrices **D_h** and **E_h** uploaded to device allocations **D_d** and **E_d**.
1. The kernel **mat_elementwise** is run on the device to compute **F_d** from **D_d** and **E_d**.
1. **F_d** is copied to **F_h** and compared with the solution **F_answer_h** from sequential CPU code.
1. Memory and device cleanup

Using rectangular copies is an important skill to master, especially when you are decomposing your problem into sections that are to be handled by different devices. In this exercise we are going enable the elementwise matrix multiplication code to use a **rectangular copy** to copy the memory allocation **F_d** back to the host (**F_h**). The source code to edit is located in [mat_elementwise.cpp](mat_elementwise.cpp) and the kernel is in [kernels_elementwise.c](kernels_elementwise.c). Your task is to make the necessary change so that copies back from **F_d** uses a **rectangular** copy ([clEnqueueReadBufferRect](https://www.khronos.org/registry/OpenCL/sdk/3.0/docs/man/html/clEnqueueReadBufferRect.html)) instead of the normal copy.

## Run the exercise code

As it stands the code produces the right answer, but it is using a standard contiguous copy to copy **F_d** back to **F_h**.

In [2]:
!make clean; make ./mat_elementwise.exe; ./mat_elementwise.exe

rm -r *.exe
g++ -std=c++11 -g -O2 -fopenmp -I/usr/include -I../include -L/usr/lib/x86_64-linux-gnu mat_elementwise.cpp\
	-o mat_elementwise.exe -lOpenCL
In file included from [01m[Kmat_elementwise.cpp:15[m[K:
[01m[K../include/cl_helper.hpp:[m[K In function ‘[01m[K_cl_command_queue** h_create_command_queues(_cl_device_id**, _cl_context**, cl_uint, cl_uint, cl_bool, cl_bool)[m[K’:
  339 |         command_queues[n] = [01;35m[KclCreateCommandQueue([m[K
      |                             [01;35m[K~~~~~~~~~~~~~~~~~~~~^[m[K
  340 | [01;35m[K            contexts[n % num_devices],[m[K
      |             [01;35m[K~~~~~~~~~~~~~~~~~~~~~~~~~~[m[K           
  341 | [01;35m[K            devices[n % num_devices],[m[K
      |             [01;35m[K~~~~~~~~~~~~~~~~~~~~~~~~~[m[K            
  342 | [01;35m[K            queue_properties,[m[K
      |             [01;35m[K~~~~~~~~~~~~~~~~~[m[K                    
  343 | [01;35m[K            &errcode[m[K
   

## Tasks

1. Load up the documentation for [clEnqueueReadBufferRect](https://www.khronos.org/registry/OpenCL/sdk/3.0/docs/man/html/clEnqueueReadBufferRect.html).
1. In [mat_mult_local.cpp:190](mat_mult_local.cpp) there is an example for performing a rectangular copy using [clEnqueueWriteBufferRect](https://www.khronos.org/registry/OpenCL/sdk/3.0/docs/man/html/clEnqueueWriteBufferRect.html). Copy-paste that code to [mat_elementwise.cpp](mat_elementwise.cpp) and begin modifications.

### Answer

You can of course always look at the answer in [mat_elementwise_answer.cpp](mat_elementwise_answer.cpp) and run the code. But then try to understand why the solution is working.

In [3]:
!make mat_elementwise_answer.exe; ./mat_elementwise_answer.exe

g++ -std=c++11 -g -O2 -fopenmp -I/usr/include -I../include -L/usr/lib/x86_64-linux-gnu mat_elementwise_answer.cpp\
	-o mat_elementwise_answer.exe -lOpenCL
In file included from [01m[Kmat_elementwise_answer.cpp:15[m[K:
[01m[K../include/cl_helper.hpp:[m[K In function ‘[01m[K_cl_command_queue** h_create_command_queues(_cl_device_id**, _cl_context**, cl_uint, cl_uint, cl_bool, cl_bool)[m[K’:
  339 |         command_queues[n] = [01;35m[KclCreateCommandQueue([m[K
      |                             [01;35m[K~~~~~~~~~~~~~~~~~~~~^[m[K
  340 | [01;35m[K            contexts[n % num_devices],[m[K
      |             [01;35m[K~~~~~~~~~~~~~~~~~~~~~~~~~~[m[K           
  341 | [01;35m[K            devices[n % num_devices],[m[K
      |             [01;35m[K~~~~~~~~~~~~~~~~~~~~~~~~~[m[K            
  342 | [01;35m[K            queue_properties,[m[K
      |             [01;35m[K~~~~~~~~~~~~~~~~~[m[K                    
  343 | [01;35m[K            &errcode

<address>
Written by Dr. Toby Potter of <a href="https://www.pelagos-consulting.com">Pelagos Consulting and Education</a> for the Pawsey Supercomputing Centre
</address>