## **Lab 4**

### **Overview**

This is another CUDA to SYCL migration sample that uses **cuBLAS (CUDA Basic Linear Algebra Subroutine)** libray. The library leverages tensor cores for acceleration of low and mixed precision matrix multiplication.

**oneAPI Math Kernel Library (oneMKL)** enhances math routines such as vector, matrix operations from Basic Linear Algebra Subprograms (BLAS) to Linear Algebra Package (LAPACK), fast Fourier transforms (FFT) and random number generator (RNG) functions. The oneMKL supports heterogenous computing functionality via SYCL and OpenMP offload.

### **Exercise**

#### 1) Git clone CUDA Samples

In [None]:
# Note: we have ahead of time git clone cuda-sampels
! [ ! -d /app/notebooks/cuda-samples ] && git clone https://github.com/NVIDIA/cuda-samples.git /app/notebooks/cuda-samples

#### 2) Make a copy of CUDA sample 'matrixMulCUBLAS' inside lab-4

In [None]:
# Make a fresh copy of CUDA 'matrixMulCUBLAS' sample
! [ -d cuda-samples ] && rm -rf cuda-samples
! mkdir -p cuda-samples/Samples/4_CUDA_Libraries/
! cp -rf /app/notebooks/cuda-samples/Common cuda-samples/
! cp -rf /app/notebooks/cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS  cuda-samples/Samples/4_CUDA_Libraries/

**Information:**
* cuda-samples/Common - CUDA helper header
* matrixMulCUBLAS - a CUDA sample that uses cuBLAS for matrix multiplication.

#### 3) Review matrixMulCUBLAS.cpp

In [None]:
! cat /app/notebooks/cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/matrixMulCUBLAS.cpp

#### 4) Use intercept-build to obtain CUDA sample project compilation dependency

**Note:** 
* Jupyter Notebook shell command execution (! \<bash command\>) is executed as single sub-process and the process state does not persist to the next ! \<bash command \>.
* For the sake of convenience of labwork, we use '&&' to perform the task on a specific location.

In [None]:
! cd cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/ && make clean
! cd cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/ && intercept-build make

In [None]:
! cd cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/ && cat compile_commands.json

**Information:**
* compile_commands.json - contains CUDA compilation information. 
* nvcc - CUDA compiler compiles matrixMulCUBLAS.cpp and produce "matrixMulCUBLAS" executable.

#### 5) Use SYCLomatics tool to convert CUDA code to SYCL C++

In [None]:
# If sycl_output exists, we delete it for a fresh 'sycl_output' SYCL conversion
! [ -d cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/sycl_output ] && rm -rf cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/sycl_output
! cd cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/ && c2s -p compile_commands.json --in-root ../../.. --gen-helper-function --cuda-include-path=/usr/local/cuda-12.1/include --out-root=sycl_output 

**Information:**
* --in-root ../../../ : specify path for all common include files for CUDA sample project, i.e. cuda-samples/
* --gen-helper-function : Generate SYCLomatic helper header files to output
* --cuda-include-path=<path to CUDA include> : Specify the CUDA include header path.
* --out-root=<SYCL output directory> : Specify the SYCL code output

**Note:**
* oneAPI Base Toolkit version 2023.02 supports CUDA Toolkit version 12.1.

#### 5) Review the SYCL output 

In [None]:
! cd cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/ && tree sycl_output

**Information:**
* MainSourceFile.yaml : CUDA to SYCL conversion log
* Common/ : CUDA libary from cuda-samples
* include/ : SYCLomatic helpder header
* Samples/4_CUDA_Libraries/matrixMulCUBLAS - the SYCL C++ code  

In [None]:
# Check the converted CUDA converted code (matrixMulCUBLAS.cpp.dp.cpp) 
! cd cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS && cat sycl_output/Samples/4_CUDA_Libraries/matrixMulCUBLAS/matrixMulCUBLAS.cpp.dp.cpp

#### 6) Compile SYCL code using DPC++ compiler

In [None]:
! cd cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/sycl_output/Samples/4_CUDA_Libraries/matrixMulCUBLAS/ && icpx -fsycl -I ../../../Common -I ../../../include *.cpp -lmkl_sycl -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -o matrixMulCUBLAS_prog

**Information:**
* -lmkl_sycl, -lmkl_intel_ilp64, -lmkl_sequential and -lmkl_core are linked oneMKL libraries.
* https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html provides recommended oneMKL linker flag 


In [None]:
# Check jacob_prog executable file information
! file cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/sycl_output/Samples/4_CUDA_Libraries/matrixMulCUBLAS/matrixMulCUBLAS_prog

#### 9) Run the Matrix Multiplication BLAS SYCL C++ program

In [None]:
! cd cuda-samples/Samples/4_CUDA_Libraries/matrixMulCUBLAS/sycl_output/Samples/4_CUDA_Libraries/matrixMulCUBLAS/ && ./matrixMulCUBLAS_prog

### **Conclusion:**

* cuBLAS library is converted to use oneMKL library.
* When compiling SYCL C++ program that uses oneAPI library, we need to use matching linked libraries -l<oneapi_lib>. 

**Notices & Disclaimers** 

Intel technologies may require enabled hardware, software or service activation. 

No product or component can be absolutely secure.  

Your costs and results may vary.  

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (0BSD), [Open Source Initiative](https://opensource.org/licenses/0BSD). No rights are granted to create modifications or derivatives of this document. 

© Intel Corporation.  Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.  Other names and brands may be claimed as the property of others.  