# Exercise - Hadamard matrix multiplication gone wrong!

In this exercise we are going to use what we know to try and find an error in an OpenCL program. The code perfoms a Hadamard Matrix multiplication, where the values in matrices **D** and **E** at coordinates (i0,i1) are multiplied **elementwise** to set a value at coordinates (i0,i1) in matrix **F**. It is very similar to the matrix multiplication code we have been examining, but simpler.

<figure style="margin-left:auto; margin-right:auto; width:80%;">
    <img style="vertical-align:middle" src="../images/elementwise_multiplication.svg">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">Elementwise multiplication of matrices D and E to get F.</figcaption>
</figure>

The source code is located in [mat_elementwise_bug.cpp](mat_elementwise_bug.cpp) and is similar to the matrix multiplication example <a href="../L3_Matrix_Multiplication/mat_mult.cpp">mat_mult.cpp</a> in almost every aspect. The steps are: 

1. Device discovery and selection
1. Matrices **D_h** and **E_h** allocated on the host and filled with random numbers.
1. Matrices **D_d** and **E_d** allocated on the compute device
1. Matrices **D_h** and **E_h** uploaded to device allocations **D_d** and **E_d**
1. The kernel **mat_elementwise** is run on the device to compute **F_d** from **D_d** and **E_d**.
1. **F_d** is copied to **F_h** and compared with the solution **F_answer_h** from sequential CPU code.
1. Memory and device cleanup

This code has some critical bugs that produce rubbish output. It is your task to find these bugs using whatever means necessary!

## Run the solution

If we run the solution it computes **F** using elementwise multiplication of matrices **D** and **E**. We see there is little or no residual between the computed matrix **F_h** and **F_answer_h**, the solution computed from a serial CPU code.

In [7]:
!make mat_elementwise_answer.exe; ./mat_elementwise_answer.exe

make: 'mat_elementwise_answer.exe' is up to date.
	               name: NVIDIA GeForce RTX 3060 Laptop GPU 
	 global memory size: 6226 MB
	    max buffer size: 1556 MB
	     max local size: (1024,1024,64)
	     max work-items: 1024
The output array F_h (as computed with OpenCL) is
--------
|  4.67e-01  1.48e-01  4.33e-02  1.45e-01  5.56e-01  6.46e-01  2.48e-01  8.07e-02 |
|  6.41e-01  3.45e-01  4.86e-01  3.20e-01  5.99e-01  8.48e-02  8.26e-02  3.35e-01 |
|  1.29e-01  3.28e-01  3.96e-02  1.28e-01  1.50e-01  3.18e-03  2.00e-01  2.18e-02 |
|  8.87e-01  3.70e-01  3.96e-01  5.27e-01  4.40e-01  1.25e-01  3.29e-01  3.10e-01 |
--------
The CPU solution (F_answer_h) is 
--------
|  4.67e-01  1.48e-01  4.33e-02  1.45e-01  5.56e-01  6.46e-01  2.48e-01  8.07e-02 |
|  6.41e-01  3.45e-01  4.86e-01  3.20e-01  5.99e-01  8.48e-02  8.26e-02  3.35e-01 |
|  1.29e-01  3.28e-01  3.96e-02  1.28e-01  1.50e-01  3.18e-03  2.00e-01  2.18e-02 |
|  8.87e-01  3.70e-01  3.96e-01  5.27e-01  4.40e-01  1.25e-01  3.29e-

## Run the buggy application

Now run the application that has some bugs in it.

In [10]:
!make mat_elementwise.exe; ./mat_elementwise.exe

make: 'mat_elementwise.exe' is up to date.
	               name: NVIDIA GeForce RTX 3060 Laptop GPU 
	 global memory size: 6226 MB
	    max buffer size: 1556 MB
	     max local size: (1024,1024,64)
	     max work-items: 1024
The output array F_h (as computed with OpenCL) is
--------
|  1.21e-01  1.83e-02  4.67e-02  2.91e-01  3.45e-02  2.33e-02  7.31e-02  4.29e-01 |
|  1.20e-01  2.25e-01  3.12e-02  1.63e-01  4.72e-01  2.96e-03  1.05e-01  7.20e-02 |
|  6.73e-01  3.59e-01  1.77e-02  7.43e-01  2.56e-01  4.28e-02  2.46e-02  7.97e-02 |
|  3.88e-01  0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00 |
--------
The CPU solution (F_answer_h) is 
--------
|  1.21e-01  1.83e-02  4.67e-02  2.91e-01  3.45e-02  2.33e-02  7.31e-02  4.29e-01 |
|  1.20e-01  2.25e-01  3.12e-02  1.63e-01  4.72e-01  2.96e-03  1.05e-01  7.20e-02 |
|  6.73e-01  3.59e-01  1.77e-02  7.43e-01  2.56e-01  4.28e-02  2.46e-02  7.97e-02 |
|  3.88e-01  4.51e-01  2.38e-01  8.18e-03  1.11e-01  1.81e-02  1.62e-01  6.2

For some reason nearly all the elements of the last row of **F_h** is filled with zeros.

## Tasks

Your task is to try and find the error using any of the techniques found in the lesson. You can of course diff the answer if you get frustrated, but then try to understand why the bug messed up the solution.

<address>
Written by Dr. Toby Potter of <a href="https://www.pelagos-consulting.com">Pelagos Consulting and Education</a> for the Pawsey Supercomputing Centre
</address>