# SYCL Migration - OceanFFT

##### Sections
- [Introduction](#Introduction)
- [Analyze CUDA source](#Analyze-CUDA-source)
- [Migrate CUDA source to SYCL source](#Migrate-CUDA-source-to-SYCL-source)
- [Analyze, Compile and Run the migrated SYCL source](#Analyze,-Compile-and-Run-the-migrated-SYCL-source)
- [Source Code](#Source-Code)

## Learning Objectives
* Use SYCLomatic Tool to migrate a simple single source CUDA application
* Use various command line options of `SYCLomatic` for CUDA to SYCL migration
* Compile and run migrated SYCL code on Intel CPUs and GPUs
* Optimize the migrated SYCL code with manual coding

## Introduction

This module will walk you through migrating CUDA code to SYCL code using Intel SYCLomatic Tool

#### Requirements
1. NVidia CUDA development machine
2. Development machine with Intel CPU/GPU OR a Intel Developer Cloud account

#### Migration Process
We will do the following steps in this hands-on workshop:
- Analyze CUDA source
- Migrate CUDA source to SYCL source
- Analyze, Compile and Run the migrated SYCL source

## Analyze CUDA source

The CUDA source for "OceanFFT" example is available on [Nvidia Github](https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/oceanFFT/)

Pull the entire repository on your CUDA Development machine.

```
git clone https://github.com/NVIDIA/cuda-samples.git

cd cuda-samples/Samples/4_CUDA_Libraries/oceanFFT/
```

The CUDA source simulates an Ocean height field using CUFFT Library in the following files.

[__oceanFFT_kernel.cu
oceanFFT.cpp__](https://github.com/NVIDIA/cuda-samples/blob/master/Samples/4_CUDA_Libraries/oceanFFT) — host code for:
- The OceanFFT sample demonstrates the FFT Computations through different processes one after the another
-	Generate wave spectrum in frequency domain,
-	Execute inverse FFT to convert to spatial domain,
-	Update height map values based on output of FFT,
-	Calculate slope by partial differences in spatial domain,


## Migrate CUDA source to SYCL source

<p style="background-color:#cdc"> Note: A CUDA development machine is required to accomplish the task in this section </p>

Now that we have analyzed the CUDA source, we will migrate the CUDA source into SYCL source using the __SYCLomatic Tool__.

In this exercise, we will walk you through step-by-step to migrate the CUDA code.

#### Requirements

Make sure you have a __NVIDIA CUDA development machine__ that can __compile and run CUDA code__. The next step is to install the tools for migrating CUDA to SYCL:

- Install SYCLomatic Tool on this machine
  - go to https://github.com/oneapi-src/SYCLomatic/releases/
  - copy link to latest `linux_release.tgz` from assets
  - on the CUDA development machine: `mkdir syclomatic; cd syclomatic`
  - `wget <link to linux_release.tgz>`
  - `tar -xvf linux_release.tgz`
  - `export PATH="/home/$USER/syclomatic/bin:$PATH"`
  - Verify installation: `c2s --version`
- pull the CUDA samples repo to this machine
  - `git clone https://github.com/NVIDIA/cuda-samples.git`
- Compile and run the `oceanFFT` sample
  - `cd cuda-samples/Samples/4_CUDA_Libraries/oceanFFT`
  - `make`


### Migrate CUDA source to SYCL source using SYCLomatic

On the NVIDIA CUDA Development machine, go to the CUDA source folder and generate a compilation database with the tool `intercept-build`. This creates a JSON file with all the compiler invocations, stores the names of the input files and the compiler options.

```
make clean
intercept-build make
```

This will create a file named `compile_commands.json` in the sample folder.

Next, use the SYCLomatic Tool (c2s) to migrate the code; it will store the result in the migration folder `dpct_output`:

```
c2s -p compile_commands.json --in-root ../../.. --gen-helper-function
```

The `--gen-helper-function` option will copy the SYCLomatic helper header files to output directory.

The `--in-root` option will specify the path for all the common include files for the CUDA project.

This command should migrate the CUDA source to the C++ SYCL source in a folder named `dpct_output` by default, and the folder will have the C++ SYCL source along with any dependencies from the `Common` folder:

- `oceanFFT.cpp.dp.cpp`
- `oceanFFT_kernel.dp.cpp`

This command may also throw a bunch of warnings about the migration process. The CUDA code that cannot be automatically migrated will have warning comments generated in the migrated source files, which have to be manually migrated.


## Analyze, Compile and Run the migrated SYCL source

<p style="background-color:#cdc"> Note: The tasks in this section should be done on Intel DevCloud or on a system with oneAPI Base toolkit installed.</p>

The migrated SYCL code are in the `Samples` folder under the `dpct_output` folder:
- `oceanFFT.cpp.dp.cpp`
- `oceanFFT_kernel.dp.cpp`

The `dpct_output` folder also has headers files needed for compiling the migrated SYCL code. The `Common` folder has header files with CUDA helper functions which are migrated to SYCL and the `include` folder has header files with SYCLomatic helper functions.

#### Requirements

Make sure you have one of the following:
- __Development machine with Intel CPU/GPU__ with Intel oneAPI Base Toolkit installed
- __Intel Developer Cloud__ account to access the Intel CPUs/GPUs on the cloud

### Compiling migrated SYCL code

Copy the files mentioned above in `dpct_output` folder on __Nvidia Development Machine__ to __Intel Developer Cloud__

To compile the migrated SYCL code we can use the following command:
```
icpx -fsycl -fsycl-targets=intel_gpu_pvc -I ../../../Common -I ../../../include *.cpp
```

There may be compile errors based on whether all of the CUDA code was migrated to SYCL or not. The migrated code may also include comments with warning messages, which could help make it easier to fix the errors. These errors have to be manually fixed to get the code to compile.

#### Build and Run
Select the cell below and click run ▶ to compile and execute the code (expect to see errors):

In [None]:
! ./q.sh run_dpct_output.sh



### Compilation Error in migrated SYCL code

```
oceanFFT.cpp.dp.cpp:422:30: error: use of undeclared identifier 'CUDART_SQRT_HALF_F'
422 |       float h0_re = Er * P * CUDART_SQRT_HALF_F;
|
oceanFFT.cpp.dp.cpp:423:30: error: use of undeclared identifier 'CUDART_SQRT_HALF_F'
423 |       float h0_im = Ei * P * CUDART_SQRT_HALF_F;
|
2 errors generated.
```
Few CUDA headers are not migrated to SYCL which contain some macros which are used in code. It has been changed manually.

CUDART_SQRT_HALF_F


##### Manually defined as below


#define SYCLRT_SQRT_HALF_F 0.707106781f


<p style="background-color:#cdc"> Note: OceanFFT CUDA sample includes OpenGL feature as well, Since SYCL does not support OpenGL we do not migrate OpenGL functions.</p>

### Commented unmigrated CUDA code

Before migration to SYCL had to comment out all openGL headers and functions from CUDA source code which are as below.

```
//#include <GLUT/glut.h>
//#include <GL/freeglut.h>
//#include <rendercheck_gl.h>
```
```
/*void runGraphicsTest(int argc, char **argv) {
// This is necessary in order to achieve optimal performance with OpenGL/CUDA
// First initialize OpenGL context, so we can properly set the GL for CUDA.
// interop.
if (false == initGL(&argc, argv)) {
	return;
}
// create vertex buffers and register with CUDA
createVBO(&heightVertexBuffer, meshSize * meshSize * sizeof(float));
checkCudaErrors(
cudaGraphicsGLRegisterBuffer(&cuda_heightVB_resource, heightVertexBuffer,
				 cudaGraphicsMapFlagsWriteDiscard));
createVBO(&slopeVertexBuffer, outputSize);
checkCudaErrors(
cudaGraphicsGLRegisterBuffer(&cuda_slopeVB_resource, slopeVertexBuffer,
				 cudaGraphicsMapFlagsWriteDiscard));
// create vertex and index buffer for mesh
createMeshPositionVBO(&posVertexBuffer, meshSize, meshSize);
createMeshIndexBuffer(&indexBuffer, meshSize, meshSize);
// register callbacks
glutDisplayFunc(display);
glutKeyboardFunc(keyboard);
glutMouseFunc(mouse);
glutMotionFunc(motion);
glutReshapeFunc(reshape);
gutTimerFunc(REFRESH_DELAY, timerEvent, 0);
// start rendering mainloop
glutMainLoop();
}*/
```

### Compile and Run the migrated SYCL source

Once you have successfully migrated the CUDA source to the SYCL source, verify that the migrated SYCL code is functioning correctly by compiling and running it on the Intel Developer Cloud, which has a variety of Intel CPUs and GPUs available for development.

#### Build and Run
Select the cell below and click run ▶ to compile and execute the code:

In [2]:
! ./q.sh run_sycl_migrated.sh



### SYCL Code Migration Analysis

When comparing the CUDA code and migrated SYCL code, we can see that there are some 1:1 equivalent calls, which are listed below in the table:

| Functionality|CUDA|SYCL
|-|-|-
| library header file|`#include <cufft.h>`|`#include <dpct/fft_utils.hpp>`
| header file|`#include <cuda_runtime.h>`|`#include <CL/sycl.hpp>` <br> `#include <dpct/dpct.hpp>`
| Memory allocation on device| `cudaMalloc((void **)&d_h0, spectrumSize)`| `d_h0 = (sycl::float2 *)sycl::malloc_device(spectrumSize, dpct::get_default_queue()`
| FFT Data | `cufftHandle fftPlan`| `dpct::fft::fft_engine_ptr fftPlan`
| Copy memory between host and device| `cudaMemcpy(d_A, h_A, mem_size_A, cudaMemcpyHostToDevice)`| `dpct::get_default_queue().memcpy(d_A, h_A, mem_size_A).wait()`
| Execute inverse FFT to convert to spatial domain | `cufftExecC2C(fftPlan, d_ht, d_ht, CUFFT_INVERSE)`| `fftPlan->compute<sycl::float2, sycl::float2>(d_ht, d_ht, dpct::fft::fft_direction::backward)`
| Create FFT Plan| `cufftPlan2d(&fftPlan, meshSize, meshSize, CUFFT_C2C);`| ` dpct::fft::fft_engine::create(&dpct::get_default_queue(), meshSize, meshSize, dpct::fft::fft_type::complex_float_to_complex_float)`
| Free device memory allocation| `cudaFree(d_h0)` | `sycl::free(d_h0, dpct::get_default_queue())`
| Free FFT memory allocation| `cufftDestroy(fftPlan)` | `dpct::fft::fft_engine::destroy(fftPlan)`

##### The main FFT operation is done using SYCL oneMKL FFT equivalent of CUFFT API's library function as shown below:
##### FFT function to perform operation in CUDA code.
 ```// FFT data 
cufftHandle fftPlan;
// create FFT plan
checkCudaErrors(cufftPlan2d(&fftPlan, meshSize, meshSize, CUFFT_C2C));
// execute inverse FFT to convert to spatial domain
checkCudaErrors(cufftExecC2C(fftPlan, d_ht, d_ht, CUFFT_INVERSE));
```
##### oneMKL FFT function to perform operation in SYCL code:
```// FFT data
dpct::fft::fft_engine_ptr fftPlan;
// create FFT plan 
DPCT_CHECK_ERROR(fftPlan = dpct::fft::fft_engine::create(&dpct::get_default_queue(), meshSize, meshSize,dpct::fft::fft_type::complex_float_to_complex_float));
// execute inverse FFT to convert to spatial domain
checkCudaErrors(cufftExecC2C(fftPlan, d_ht, d_ht, CUFFT_INVERSE));
```


## Source Code

This section describes the location of the CUDA source and the contents of different SYCL source code directories in this project.

| folder name | source code description
| --- | ---
| [CUDA github](https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/oceanFFT) | Original CUDA Source used for migration
| dpct_output | Contains output of SYCLomatic Tool used to migrate SYCL-compliant code from CUDA code. This SYCL code has some unmigrated code that must be manually fixed to get full functionality. (The code does not functionally work as generated.)
| sycl_migrated | Contains manually migrated SYCL code from CUDA code.
<p style="background-color:#cdc"> Note: OceanFFT CUDA sample includes OpenGL feature as well, Since SYCL does not support OpenGL we do not migrate OpenGL functions.</p>


## Summary

In this module we have learnt how to migrate simple CUDA source to SYCL source to get functionality using `SYCLomatic` and then analized/optimized the SYCL source by manually coding. 