# SYCL Migration - Introduction

##### Sections
- [Accelerating Choice with SYCL](#Accelerating-Choice-with-SYCL)
- [CUDA vs SYCL - Program Structure](CUDA-vs-SYCL---Program-Structure)
- [CUDA vs SYCL - Execution Model](CUDA-vs-SYCL---Execution-Model)
- [SYCLomatic Tool Introduction](SYCLomatic-Tool-Introduction)
- [Migration Workflow Overview](Migration-Workflow-Overview)
- [Tips for Migrating CUDA to SYCL using SYCLomatic](#Tips-for-Migrating-CUDA-to-SYCL-using-SYCLomatic)

## Learning Objectives
* Explain the advantages of using C++ SYCL language to program for accelerators
* Explain the program structure and execution model differences with CUDA and SYCL
* Setup and explain migration flow with SYCLomatic Tool

## Accelerating Choice with SYCL

C++ and SYCL deliver a unified programming model, performance portability, and C++ alignment for applications using accelerators from differnet vendors. C++ with SYCL programming language offers many advantages:

- Open, standards-based
- Multiarchitecture performance 
- Freedom from vendor lock-in
- Comparable performance to native CUDA on Nvidia GPUs
- Extension of widely used C++ language


## CUDA vs SYCL - Program Structure

NVIDIA introduced CUDA, which originally stood for “Compute Unified Device Architecture,” a purpose parallel computing platform and programming model. NVIDIA GPUs can solve many complex computational problems.

A CUDA program is comprised of two primary components: a host and a GPU kernel. Host code runs locally on a CPU, while the GPU kernel codes are GPU functions that run on GPU devices. Kernel execution can be completely independent of host execution.

SYCL (pronounced “sickle”) is a cross-platform abstraction of C++ programming model for OpenCL, introduced by Khronos Group. SYCL derives its device execution and memory models from the OpenCL kernel programming language.

The single-source programming model in SYCL allows the kernel code to be embedded in the host code. SYCL also a supports single-source multiple-compiler passes technique, which allows a single-source code to be parsed by multiple compilers for generating device binaries. SYCL memory objects can be created in the form of buffers, sub-buffers, and images.

## CUDA vs SYCL - Execution Model

The CUDA execution model will expose an abstract view of NVIDIA GPU architecture. According to CUDA programming guidelines, all devices should preserve the same fundamental concepts. An NVIDIA GPU is built as a streaming multiprocessor (SM). An SM in a GPU is responsible for executing groups of threads. When one group of a thread is allocated to one SM, the threads that are left over will remain until their lifetime. Each SM consists of a set of cores, shred memory, registers, and a scheduler unit.

The CUDA thread hierarchy is composed of a grid of thread blocks, which consist of a set of execution threads on the same SM. Each thread block has its own block ID within the grid and the thread can be one-, two-, or three-dimensional. A grid is an array of thread blocks launched by a kernel that reads inputs and writes results to global memory. A grid can also be one-, two-, or three-dimensional.

In SYCL, work is described as a command group (CG). CGs are submitted to the SYCL queue through a submit function. When the CG is submitted to the queue, the SYCL runtime will analyze for data dependencies and schedule the execution. The SYCL thread hierarchy can be one-, two-, or three-dimensional sets of work items called an ND-range. A work item is a single thread within a thread hierarchy. These work items are equally sized of same dimensionality called work groups. A work group can be a one-, two- or three-dimensional set of threads within a thread hierarchy. ND-range is composed of three components; global range, local range, and number of work groups.

## SYCLomatic Tool Introduction

The SYCLomatic Tool is an open-source tool available on [Github](https://github.com/oneapi-src/SYCLomatic). Daily builds are available on Github releases, which needs to be downloaded and installed on your CUDA development machine.

The SYCLomatic Tool assists in the migration of a developer's program that is written in CUDA to a program written in SYCL.

Visit the [SYCLomatic Tool Github Readme](https://github.com/oneapi-src/SYCLomatic) for additional information about the tool. Visit the Release Notes for known issues and the most up-to-date information.

<p style="background:gray;color:white">
**NOTE:** Migration will result in a project that is not entirely converted. Additional work, as outlined by the output of the SYCLomatic Tool, is required to complete the conversion.</p>

## Migration Workflow Overview

In most cases, migration of a user’s CUDA source code to SYCL code with the SYCLomatic Tool can be divided in to three stages: preparation, migration, and review.

<img src="assets/steps.png">

In the preparation stage, the project directory is cleaned, compile options are noted, and in some cases, source files may need to be modified. For most makefile-based projects, we recommend running the intercept-build script, which tracks and saves the compilation commands, compilation flags, and options automatically in a JSON file. For Microsoft Visual Studio projects, ensure the .vcxproj file exists, which can be passed to the `SYCLomatic` migration tool to keep track of project options. For simple projects, compile options and macros could be manually specified when running `c2s`. When running an intercept-build in a command line, specify the build command.

```intercept-build make```

In the migration stage, the SYCLomatic Tool executable `c2s` is run. It takes in the original application as an input, analyzes its headers and sources as well as the generated compile_commands.json if it exists, and outputs SYCL code and reports.

```c2s -p ./ --in-root ./ --out-root output *.cu```

If intercept-build was not run, compile options can also be specified manually as c2s arguments.

```c2s --out-root=output source.cu –extra-arg=”-I./include” --extra-arg=”-DBUILD_CUDA”```

In the final review stage, manual verification and edits are required. For parts of the code that the SYCLomatic Tool is unable to migrate, the user will need to fix the migrated code and ensure that it is correct. For portions of code that require manual intervention, SYCLomatic messages are logged in as comments in the migrated source files for easy discovery. For information on manually completing the migration process based on the SYCLomatic messages, refer to the [Diagnostics Reference of the SYCLomatic Tool User Guide](https://software.intel.com/content/www/us/en/develop/documentation/intel-dpcpp-compatibility-tool-user-guide/top/diagnostics-reference.html).

The following diagram illustrates the workflow and the files generated when using the SYCLomatic Tool.

<img src="assets/syclomatic_workflow.png">

## Tips for Migrating CUDA to SYCL using SYCLomatic

The sections below include useful tips to get started with migration from CUDA to SYCL using SYCLomatic Tool

### CUDA Development Machine – Sanity Check

You will need a CUDA development machine to migrate CUDA source to SYCL source. CUDA development should have a CUDA SDK installed and be able to compile and run CUDA code. The steps below will help you to check that you have the right configuration to proceed:

- Run `nvidia-smi` in terminal
  - `nvidia-smi`
  - check that driver version is displayed, Nvidia card is detected, and CUDA version is displayed
- Check CUDA headers in install path
  - Default installation path is `/usr/local/cuda/include`
  - If CUDA is installed in non-default path, make a note of path, you will need it later
- Check that you are able to compile a simple CUDA code
  - `nvcc --help`
  - `nvcc test.cu`

### Installing SYCLomatic

The SYCLomatic Tool is installed on your CUDA development machine where you will be running the tool to migrate CUDA code to SYCL code.

- Go to https://github.com/oneapi-src/SYCLomatic/releases
  - Under Assets
  - Copy web link to `linux_release.tgz`
- On you CUDA Development machine:
  - In home directory or anywhere: `mkdir syclomatic; cd syclomatic`
  - `wget <link to linux_release.tgz>`
  - `tar –xvf linux_release.tgz`
  - `export PATH="/home/$USER/syclomatic/bin:$PATH"`
  - `c2s --version`
  
### SYCLomatic tool usage

Below are some useful command-line options of the SYCLomatic Tool (`c2s`). Check `c2s --help` for full list of all options.

- Migrate a single CUDA source file:
  - `c2s test.cu`
- Migrate a single CUDA source file and copy all syclomatic helper header files:
  - `c2s test.cu --gen-helper-function`
- Migrate a single CUDA source to a specific directory name
  - `c2s test.cu --out-root sycl_code`
- Migrate a single CUDA source with source root tree
  - `c2s test.cu --in-root ../..`
- Migrate a single CUDA source with custom CUDA installation
  - `c2s test.cu --cuda-include-path /tmp/cuda/include`
- Migrate a CUDA project with makefile:
  - `intercept-build make`
  - `c2s -p compile_command.json` OR `c2s -p .`
- Migrate a CUDA project with makefile and generate makefile for building SYCL:
  - `intercept-build make`
  - `c2s -p . --gen-build-script`


### Compiling SYCL on Intel and Other hardware

- Install __Intel oneAPI C++/DPC++ Compiler__ or __Intel oneAPI Base Toolkit__
- Install CUDA Plugin for oneAPI from CodePlay 
- Link to [Installation Instructions](https://developer.codeplay.com/products/oneapi/nvidia/)

- Set environment variable for using the Compiler
  - `source /opt/intel/oneapi/setvars.sh`

- Compile SYCL for Intel CPUs/GPUs
  - `icpx –fsycl test.cpp`

- Compile SYCL for Nvidia* GPUs
  - `icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda test.cpp`

### Program Structure of Migrated SYCL code

The SYCLomatic migrated code will use SYCL functions and methods to get functionality of CUDA APIs. The SYCL Specification has more information about all the SYCL classes, functions, and methods.

#### dpct/dpct.hpp header file

The migrated code will use the header file `dpct/dpct.hpp` that has helper functions, which is a wrapper for SYCL calls. The SYCLomatic (`c2s`) option `--use-custom-helper=api` used during migration will include all the helper functions used in migrated code in a folder called `include` under the output folder. The header files with all helper functions are also available in the `include` folder of the SYCLomatic installation.

#### In order queue property

By default, The SYCLomatic migrated code creates `sycl::queue` with `in_order` property. If the application has multiple kernels submitted and if they have no data dependency, you may be able to get more performance by allowing concurrent execution of kernels by removing `in_order` queue property.

The code used to create `sycl::queue` in SYCLomatic migrated code will look as shown below:

```cpp
  dpct::device_ext &dev_ct1 = dpct::get_current_device();
  sycl::queue &q_ct1 = dev_ct1.default_queue();
```

The above code is using `dpct` helper functions to create a `sycl::queue` with the in_order queue property. The above code can be re-written without the `dpct` helper function, as show below:

```cpp
   sycl::queue q_ct1(sycl::property::queue::in_order());
```
If you plan to allow out-of-order execution of kernels and concurrent execution of kernels, you will have to remove the in_order queue property, as shown below:

```cpp
   sycl::queue q_ct1;
```

Note that removing the `in_order` queue property may result in data race conditions if there are any data dependencies between the kernels. You may have to analyze the kernel code and add event-based dependencies where necessary.

#### Device selection

By default, the device selector used for execution of kernels is `sycl::default_selector_v`, which may use a GPU or CPU. If you want the kernels to execute on GPU only, you will have to add `sycl::gpu_selector_v` during `sycl::queue` creation, as show below:

```cpp
   sycl::queue q_ct1(sycl::gpu_selector_v);

OR

  sycl::queue q_ct1{sycl::gpu_selector_v, sycl::property::queue::in_order()};
```

Re-write the `sycl::queue` creation code in the SYCLomatic migrated code based on your preference.

## Reference

- [SYCLomatic Tool Download](https://github.com/oneapi-src/SYCLomatic/releases)
- [One Stop Portal for CUDA to SYCL Migration](https://www.intel.com/content/www/us/en/developer/tools/oneapi/training/migrate-from-cuda-to-cpp-with-sycl.html)
- [SYCL2020 Spectifcation](https://registry.khronos.org/SYCL/specs/sycl-2020/pdf/sycl-2020.pdf)
- [SYCL Code Samples](https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL)
- [Intel GPU Optimization Guide](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-gpu-optimization-guide/top.html)
