# Exercise - Hadamard matrix multiplication gone wrong!

In this exercise we are going to use what we know to try and find an error in a HIP program. We revisit Hadamard multiplication, where the values in matrices **D** and **E** at coordinates (i0,i1) are multiplied together to set the value at coordinates (i0,i1) in matrix **F**.

<figure style="margin-left:auto; margin-right:auto; width:80%;">
    <img style="vertical-align:middle" src="../images/elementwise_multiplication.svg">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">Elementwise multiplication of matrices D and E to get F.</figcaption>
</figure>

The source code is located in [mat_elementwise.cpp](mat_elementwise_buggy.cpp). The program is similar to matrix multiplication in almost every way, except the kernel implementation. 

The steps are:

1. Parse program arguments
1. Discover resources and choose a compute device
1. Construct matrices **D_h** and **E_h** on the host and fill them with random numbers
1. Allocate memory for arrays **D_d**, **E_d**, and **F_d** on the compute device
1. Upload matrices **D_h** and **E_h** from the host to **D_d** and **E_d** on the device
1. Run the kernel to compute **F_d** from **D_d** and **E_d** on the device
1. Copy the buffer for matrix **F_d** on the device back to **F_h** on the host
1. Test the computed matrix **F_h** against a known answer
1. Write the contents of matrices **D_h**, **E_h**, and **F_h** to disk
1. Clean up memory alllocations and release resources

## Run the solution

If we run the solution it computes **F** using elementwise multiplication of matrices **D** and **E**. We see there is little or no residual between the computed matrix **F_h** and **F_answer_h**, the solution computed from a serial CPU code.

In [1]:
!make mat_elementwise_answer.exe; ./mat_elementwise_answer.exe

make: 'mat_elementwise_answer.exe' is up to date.
Device id: 0
	name:                                    NVIDIA GeForce RTX 3060 Laptop GPU
	global memory size:                      6226 MB
	available registers per block:           65536 
	maximum shared memory size per block:    49 KB
	maximum pitch size for memory copies:    2147 MB
	max block size:                          (1024,1024,64)
	max threads in a block:                  1024
	max Grid size:                           (2147483647,65535,65535)
The output array F_h (as computed with HIP) is
--------
|  4.50e-01  1.70e-01  2.77e-01  2.21e-02  2.46e-02  3.48e-02  4.41e-02  2.05e-01 |
|  7.57e-01  4.06e-03  3.90e-01  2.74e-01  3.16e-01  3.38e-05  9.45e-02  9.03e-01 |
|  1.60e-02  6.24e-03  9.69e-02  4.00e-01  4.89e-01  4.12e-01  8.46e-01  8.93e-02 |
|  3.23e-01  3.19e-02  2.84e-01  4.18e-01  2.02e-02  3.38e-01  2.30e-01  1.49e-01 |
--------
The CPU solution (F_answer_h) is 
--------
|  4.50e-01  1.70e-01  2.77e-01  2.21e-02  2.46e

## Run the buggy application

Now run the application that has some bugs in it.

In [2]:
!make mat_elementwise.exe; ./mat_elementwise.exe

make: 'mat_elementwise.exe' is up to date.
Device id: 0
	name:                                    NVIDIA GeForce RTX 3060 Laptop GPU
	global memory size:                      6226 MB
	available registers per block:           65536 
	maximum shared memory size per block:    49 KB
	maximum pitch size for memory copies:    2147 MB
	max block size:                          (1024,1024,64)
	max threads in a block:                  1024
	max Grid size:                           (2147483647,65535,65535)
The output array F_h (as computed with HIP) is
--------
|  4.50e-01  1.70e-01  2.77e-01  2.21e-02  2.46e-02  3.48e-02  4.41e-02  2.05e-01 |
|  7.57e-01  4.06e-03  3.90e-01  2.74e-01  3.16e-01  3.38e-05  9.45e-02  9.03e-01 |
|  1.60e-02  6.24e-03  9.69e-02  4.00e-01  4.89e-01  4.12e-01  8.46e-01  8.93e-02 |
|  3.23e-01  0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00 |
--------
The CPU solution (F_answer_h) is 
--------
|  4.50e-01  1.70e-01  2.77e-01  2.21e-02  2.46e-02  3.

For some reason nearly all the elements of the last row of **F_h** are filled with an incorrect solution!

## Tasks

Your task is to try and find the error using any of the techniques found in the lesson. You can of course check (or diff) the answer [mat_elementwise_answer.cpp](mat_elementwise_answer.cpp) if you get frustrated, but then try to understand **how** these changes messed up the solution.

### Hint

The block size is (3,3,1) and the matrix size is (4,8). We use row-major ordering so dimension `x` of the grid corresponds to dimension `1` of the matrices, and dimension `y` of the grid corresponds to dimension `0` of the matrices. A minimum grid size is then (9,6,1) with (3,2,1) blocks in each dimension. This grid is mapped to (6,9) in the coordinate system of the matrices.

<address>
Written by Dr. Toby Potter of <a href="https://www.pelagos-consulting.com">Pelagos Consulting and Education</a> for the Pawsey Supercomputing Centre
</address>