# SYCL Migration - MonteCarloMultiGPU

##### Sections
- [Introduction](#Introduction)
- [Analyze CUDA source](#Analyze-CUDA-source)
- [Migrate CUDA source to SYCL source](#Migrate-CUDA-source-to-SYCL-source)
- [Analyze, Compile and Run the migrated SYCL source](#Analyze,-Compile-and-Run-the-migrated-SYCL-source)
- [Source Code](#Source-Code)

## Learning Objectives
* Use SYCLomatic Tool to migrate a simple single source CUDA application
* Use various command line options of `SYCLomatic` for CUDA to SYCL migration
* Compile and run migrated SYCL code on Intel CPUs and GPUs
* Optimize the migrated SYCL code with manual coding

## Introduction

This module will walk you through migrating CUDA code to SYCL code using Intel SYCLomatic Tool

#### Requirements
1. NVidia CUDA development machine
2. Development machine with Intel CPU/GPU or a Intel Developer Cloud account

#### Migration Process
We will do the following steps in this hands-on workshop:
- Analyze CUDA source
- Migrate CUDA source to SYCL source
- Analyze, Compile and Run the migrated SYCL source

## Analyze CUDA source

The CUDA source for "MonteCarloMultiGPU" example is available on [Nvidia Github](https://github.com/NVIDIA/cuda-samples/tree/master/Samples/5_Domain_Specific/MonteCarloMultiGPU)

Pull the entire repository on your CUDA Development machine.

```
git clone https://github.com/NVIDIA/cuda-samples.git

cd cuda-samples/Samples/5_Domain_Specific/MonteCarloMultiGPU/
```

The CUDA source demonstrates how to calculate the pricing of European Options by applying the Black-Scholes formula and with MonteCarlo approach.

[__MonteCarloMultiGPU.cpp MonteCarlo_kernel.cu MonteCarlo_gold.cpp__](https://github.com/NVIDIA/cuda-samples/tree/master/Samples/5_Domain_Specific/MonteCarloMultiGPU) — host code for:

MonteCarlo Method first generates a random number based on a probability distribution. The random number then uses the additional inputs of volatility and time to expiration to generate a stock price. The generated stock price at the time of expiration is then used to calculate the value of the option. The model then calculates results over and over, each time using a different set of random values from the probability functions

The first stage of the computation is the generation of a normally distributed N(0, 1)number sequence, which comes down to uniformly distributed sequence generation. Once we’ve generated the desired number of samples, we use them to compute an expected value and confidence width for the underlying option.

The Black-Scholes model relies on fixed inputs (current stock price, strike price, time until expiration, volatility, risk free rates, and dividend yield). The model is based on geometric Browniani motion with constant drift and volatility. We can calculate the price of the European put and call options explicitly using the Black–Scholes formula.

The price of a call option C in terms of the Black–Scholes parameters is

C=N(d1)×S−N(d2)×PV(K)

where:

- d1=1σ√T[log(SK)+(r+σ22)T]

- d2=d1−σ√T

- PV(K)=Kexp(−rT)

After repeatedly computing appropriate averages, the estimated price of options can be obtained, which is consistent with the analytical results from Black-Scholes model.

## Migrate CUDA source to SYCL source

<p style="background-color:#cdc"> Note: A CUDA development machine is required to accomplish the task in this section </p>

Now that we have analyzed the CUDA source, we will migrate the CUDA source into SYCL source using the __SYCLomatic Tool__.

In this exercise, we will walk you through step-by-step to migrate the CUDA code.

#### Requirements

Make sure you have a __NVIDIA CUDA development machine__ that can __compile and run CUDA code__. The next step is to install the tools for migrating CUDA to SYCL:

- Install SYCLomatic Tool on this machine
  - go to https://github.com/oneapi-src/SYCLomatic/releases/
  - copy link to latest `linux_release.tgz` from assets
  - on the CUDA development machine: `mkdir syclomatic; cd syclomatic`
  - `wget <link to linux_release.tgz>`
  - `tar -xvf linux_release.tgz`
  - `export PATH="/home/$USER/syclomatic/bin:$PATH"`
  - Verify installation: `c2s --version`
- pull the CUDA samples repo to this machine
  - `git clone https://github.com/NVIDIA/cuda-samples.git`
- Compile and run the `MonteCarloMultiGPU` sample
  - `cd cuda-samples/Samples/5_Domain_Specific/MonteCarloMultiGPU/`
  - `make`


### Migrate CUDA source to SYCL source using SYCLomatic

On the NVIDIA CUDA Development machine, go to the CUDA source folder and generate a compilation database with the tool `intercept-build`. This creates a JSON file with all the compiler invocations, stores the names of the input files and the compiler options.

```
make clean
intercept-build make
```

This will create a file named `compile_commands.json` in the sample folder.

Next, use the SYCLomatic Tool (c2s) to migrate the code; it will store the result in the migration folder `dpct_output`:

```
c2s -p compile_commands.json --in-root ../../.. --gen-helper-function
```

The `--gen-helper-function` option will copy the SYCLomatic helper header files to output directory.

The `--in-root` option will specify the path for all the common include files for the CUDA project.

This command should migrate the CUDA source to the C++ SYCL source in a folder named `dpct_output` by default, and the folder will have the C++ SYCL source along with any dependencies from the `Common` folder,

- `MonteCarloMultiGPU.cpp.dp.cpp`
- `MonteCarlo_kernel.dp.cpp`
- `MonteCarlo_gold.cpp.dp.cpp`
- `multithreading.cpp`
- `MonteCarlo_reduction.dp.hpp`
- `MonteCarlo_common.h`
- `realtype.h`

This command may also throw a bunch of warnings about the migration process. The CUDA code that cannot be automatically migrated will have warning comments generated in the migrated source files, which have to be manually migrated.


## Analyze, Compile and Run the migrated SYCL source

<p style="background-color:#cdc"> Note: The tasks in this section should be done on Intel DevCloud or on a system with oneAPI Base toolkit installed.</p>

The migrated SYCL code are in the `Samples` folder under the `dpct_output` folder:
- `MonteCarloMultiGPU.cpp.dp.cpp`
- `MonteCarlo_kernel.dp.cpp`
- `MonteCarlo_gold.cpp.dp.cpp`
- `multithreading.cpp`
- `MonteCarlo_reduction.dp.hpp`
- `MonteCarlo_common.h`
- `realtype.h`

The `dpct_output` folder also has headers files needed for compiling the migrated SYCL code. The `Common` folder has header files with CUDA helper functions which are migrated to SYCL and the `include` folder has header files with SYCLomatic helper functions.

#### Requirements

Make sure you have one of the following:
- __Development machine with Intel CPU/GPU__ with Intel oneAPI Base Toolkit installed
- __Intel Developer Cloud__ account to access the Intel CPUs/GPUs on the cloud

### Compiling migrated SYCL code

Copy the files mentioned above in `dpct_output` folder on __Nvidia Development Machine__ to __Intel Developer Cloud__

To compile the migrated SYCL code we can use the following command:
```
icpx -fsycl -fsycl-targets=intel_gpu_pvc -I ../../../Common -I ../../../include *.cpp
```

There may be compile errors based on whether all of the CUDA code was migrated to SYCL or not. The migrated code may also include comments with warning messages, which could help make it easier to fix the errors. These errors have to be manually fixed to get the code to compile.

#### Build and Run
Select the cell below and click run ▶ to compile and execute the code (expect to see errors):

In [None]:
! ./q.sh run_dpct_output.sh



### Fixing functionally incorrect SYCL code

##### 1. DPCT1032:19: A different random number generator is used. You may need to adjust the code.
```
dpct::rng::device::rng_generator<oneapi::mkl::rng::device::mcg59<1>> *rngStates;
```
SYCLomatic migrates CUDA RNG to mcg59 RNG which does not accomplish the functional correctness of the sample. Manually need to use philox4x32x10<1> RNG to achieve the functional correctness.
```
oneapi::mkl::rng::device::philox4x32x10<1> *rngStates;
```

##### 2. Creating the host RNG.
```
 gen = dpct::rng::create_host_rng(dpct::rng::random_engine_type::mcg59, dpct::cpu_device().default_queue());
```

As mcg59 was used has to be changed to philox4x32x10<1> RNG to create the host.

```
gen = dpct::rng::create_host_rng(dpct::rng::random_engine_type::philox4x32x10);
```

##### 3. There are few place where mcg59 was used in code, need to replaced with philox4x32x10.
```
dpct::rng::device::rng_generator<oneapi::mkl::rng::device::mcg59<1>>  *__restrict rngStates,
dpct::rng::device::rng_generator<oneapi::mkl::rng::device::mcg59<1>> localState = rngStates[tid]; 
```
```
oneapi::mkl::rng::device::philox4x32x10<1> *__restrict rngStates,
oneapi::mkl::rng::device::philox4x32x10<1> localState = rngStates[tid];
```

##### 4. DPCT1105:34: The mcg59 random number generator is used. The subsequence argument `item_ct1.get_local_id(2)` is ignored.
```
rngState[tid] = dpct::rng::device::rng_generator<oneapi::mkl::rng::device::mcg59<1>>(item_ct1.get_group(2) + item_ct1.get_group_range(2) * device_id, 0);
```

MCG59 RNG doesn't support subsequence argument this was the main reason why mcg59 RNG can't be used to achieve the functional correctness.

```
rngState[tid] = oneapi::mkl::rng::device::philox4x32x10<1>(
item_ct1.get_group(2) + item_ct1.get_group_range(2) * device_id,
{0, static_cast<std::uint64_t>((item_ct1.get_local_id(2)) * 8)});
```

##### 5. Changing the functional call rng::device::gaussian() 
```
 real r = localState.generate<oneapi::mkl::rng::device::gaussian<float>, 1>();
```
Usage of RNG generate has been changed
```
oneapi::mkl::rng::device::gaussian<real> dist;

auto r = oneapi::mkl::rng::device::generate(dist, localState);

```


### Compile and Run the migrated SYCL source

Once you have successfully migrated the CUDA source to the SYCL source, verify that the migrated SYCL code is functioning correctly by compiling and running it on the Intel Developer Cloud, which has a variety of Intel CPUs and GPUs available for development.

#### Build and Run sycl_migrated
Select the cell below and click run ▶ to compile and execute the code:

In [None]:
! ./q.sh run_sycl_migrated.sh

### SYCL Code Migration Analysis

When comparing the CUDA code and migrated SYCL code, we can see that there are some 1:1 equivalent calls, which are listed below in the table:

| Functionality|CUDA|SYCL
|-|-|-
| library header file|`#include <curand_kernel.h>`|`#include <dpct/rng_utils.hpp>`
| header file|`#include <cuda_runtime.h>`|`#include <sycl/sycl.hpp>` <br> `#include <dpct/dpct.hpp>`
| RNG States | `curandState *rngStates;`| `oneapi::mkl::rng::device::philox4x32x10<1> *rngStates;`
| Create generator host | `curandGenerator_t gen;` <br> `curandCreateGeneratorHost(&gen, CURAND_RNG_PSEUDO_DEFAULT)`| `dpct::rng::create_host_rng(dpct::rng::random_engine_type::philox4x32x10);`
| RNG init| `curand_init(blockIdx.x + gridDim.x * device_id, threadIdx.x, 0, &rngState[tid]);`| `rngState[tid] = oneapi::mkl::rng::device::philox4x32x10<1>(item_ct1.get_group(2) + item_ct1.get_group_range(2) * device_id,{0, static_cast<std::uint64_t>((item_ct1.get_local_id(2)) * 8)});`




## Source Code

This section describes the location of the CUDA source and the contents of different SYCL source code directories in this project.

| folder name | source code description
| --- | ---
| [CUDA github](https://github.com/NVIDIA/cuda-samples/tree/master/Samples/5_Domain_Specific/MonteCarloMultiGPU) | Original CUDA Source used for migration
| dpct_output | Contains output of SYCLomatic Tool used to migrate SYCL-compliant code from CUDA code. This SYCL code has some unmigrated code that must be manually fixed to get full functionality. (The code does not functionally work as generated.)
| sycl_migrated | Contains manually migrated SYCL code from CUDA code.



## Summary

In this module we have learnt how to migrate simple CUDA source to SYCL source to get functionality using `SYCLomatic` and then analized/optimized the SYCL source by manually coding. 