# Module 1.1 - port a Intel® oneAPI Deep Neural Network Library (oneDNN)  sample from CPU to GPU  - oneDNN CNN FP32 Inference

## Learning Objectives
In this module the developer will:
* Learn how to port a oneDNN sample from a CPU-only version to a CPU&GPU version by using DPC++
* Learn how to program a simple convolutional neural network by using oneDNN


***
# Exercise : Porting oneDNN application from CPU to GPU 


## Step 1 : introduce oneDNN configurations inside Intel® oneAPI toolkits
oneDNN has four different configurations inside the Intel oneAPI toolkits. Each configuration is in a different folder under the oneDNN installation path, and each configuration supports different compilers or threading libraries.

Set the installation path of your Intel oneAPI toolkit

In [None]:
%env ONEAPI_INSTALL=/opt/intel/oneapi

In [None]:
import os
if os.path.isdir(os.environ['ONEAPI_INSTALL']) == False:
    print("ERROR! wrong oneAPI installation path")

In [None]:
!printf '%s\n'    $ONEAPI_INSTALL/oneDNN/latest/cpu_*

As you can see, there are 4 different folders under the oneDNN installation path, and each of those configurations supports different features. This tutorial will make use of two configurations.

First of all, create a lab folder for this exercise.

In [None]:
!mkdir lab;

##  Step 2 : scanning the cnn_inference_f32.cpp code which only supports CPU

This C++ API example demonstrates how to build an AlexNet neural network topology for forward-pass inference, and it can run only on CPU.
You can find a detailed code explanation at this [link](https://oneapi-src.github.io/oneDNN/cnn_inference_f32_cpp.html)

There is a cnn_inference_f32.cpp, which has a CPU-only implementation.
Let us copy into the lab folder, and use it as the base of the lab.


In [None]:
!cp codes_for_ipynb/cnn_inference_f32.cpp lab/

The user could check the source file using the following command, but we recommened to use the detailed code explanation at this [link](https://oneapi-src.github.io/oneDNN/cnn_inference_f32_cpp.html) instead.

In [None]:
!cat lab/cnn_inference_f32.cpp 

Then, copy the required CMake file into the lab folder.

In [None]:
!cp $ONEAPI_INSTALL/oneDNN/latest/cpu_gomp/examples/CMakeLists.txt lab/

## Step3:   Build and Execution


### Build and Run with GNU Compiler and OpenMP 
For this CPU-only AlexNet neural network topology for forward-pass inference sample, the GNU compiler is used.
The following section guides you how to build with G++ and run on CPU.

#### Script - build.sh
The script **build.sh** encapsulates the compiler  command and flags that will generate the executable.

In [None]:
%%writefile build.sh
#!/bin/bash
source $ONEAPI_INSTALL/setvars.sh --dnnl-configuration=cpu_gomp  --force > /dev/null 2>&1
export EXAMPLE_ROOT=./lab/
mkdir cpu_gomp
cd cpu_gomp
cmake .. -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DDNNL_CPU_RUNTIME=OMP -DDNNL_GPU_RUNTIME=NONE
make cnn-inference-f32-cpp



Once you achieve an all-clear from your compilation, you execute your program on the Intel DevCloud or in local environments.

#### Script - run.sh
the script **run.sh** encapsulates the program for submission to the job queue for execution.
The user must switch to the G++ oneDNN configuration by inputting a custom configuration "--dnnl-configuration=cpu_gomp" when running "source setvars.sh".

By default, oneDNN Verbose log is disabled.
You can unmark  #export DNNL_VERBOSE=1 to enable oneDNN verbose log.

In [None]:
%%writefile run.sh
#!/bin/bash
source $ONEAPI_INSTALL/setvars.sh --dnnl-configuration=cpu_gomp  --force > /dev/null 2>&1
echo "########## Executing the run"
# unmark below line to enable oneDNN verbose log
#export DNNL_VERBOSE=1
./cpu_gomp/out/cnn-inference-f32-cpp
echo "########## Done with the run"




#### Submitting **build.sh** and **run.sh** to the job queue
Now we can submit the **build.sh** and **run.sh** to the job queue.

##### NOTE - it is possible to execute any of the build and run commands in local environments.
To enable users to run their scripts both on the DevCloud or in local environments, this and subsequent training checks for the existence of the job submission command **qsub**.  If the check fails, it is assumed that build/run will be local.

In [None]:
!rm -rf cpu_gomp; chmod 755 q; chmod 755 build.sh; chmod 755 run.sh;if [ -x "$(command -v qsub)" ]; then ./q build.sh; ./q run.sh; else ./build.sh; ./run.sh; fi

#### Enable oneDNN Verbose log and check the engine kind for each operation
cpu should be the engine kind for most of the operations, and you should be able to check the engine kind after "dnnl_verbose,exec," for each operation.
Check this [link](https://oneapi-src.github.io/oneDNN/dev_guide_verbose.html) for a detailed explanation of oneDNN verbose log.

Below is an example for oneDNN verbose log for convolution on CPU:

dnnl_verbose,exec,cpu,convolution,jit:avx2,forward_inference,src_f32::blocked:abcd:f0 wei_f32::blocked:Acdb8a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd8b:f0,,alg:convolution_direct,mb1_ic3oc96_ih227oh55kh11sh4dh0ph0_iw227ow55kw11sw4dw0pw0,0.458008

##  Step 4 : Modifying the cnn_inference_f32.cpp code to support both CPU and GPU

In this session, we will convert the above cnn_inference_f32.cpp to support both CPU and GPU and compile the sample with DPC++ instead of G++.

There are three steps to do the code conversion from CPU to GPU for this sample.

* Step 1 : change engine::kind from CPU to GPU
* Step 2 : implement a function to access GPU memory via SYCL buffer and its accessor
* Step 3 : write user's data into GPU memory via the implemented function from Step 2

There is a cnn_inference_f32.patch file inside the src folder. It contains all the changes for porting CPU to GPU against the CPU-only version of cnn_inference_f32.cpp.
First we must patch the cnn_inference_f32.cpp under the lab folder.

In [None]:
!cd lab;patch < ../codes_for_ipynb/cnn_inference_f32.patch;

Users can check the source file using the following command.

In [None]:
!cat lab/cnn_inference_f32.cpp 

You can find related modification in below cnn_inference_f32.cpp, and the modifications for each step are wrapped up with ">>>>>>" and "<<<<<<".

### step1 : change engine::kind from CPU to GPU
changing engine kind from cpu to gpu during engine instantiation.
* Before patching : engine eng(engine::kind::cpu, 0);
* After patching : engine eng(engine::kind::gpu, 0);

### step 2 : implement a function to access GPU memory via SYCL buffer and its accessor
You can refer to the below function write_to_dnnl_memory for that.
overall, we use SYCL buffer and its accessor to access GPU memory.
auto buffer = mem.get_sycl_buffer<uint8_t>();
auto dst = buffer.get_access<cl::sycl::access::mode::write>();

 #### Step 3 : write user's data into GPU memory via the implemented function from Step 2
 For accessing user data in GPU memory, we can't use the host pointer to write data into that, but we use write_to_dnnl_memory function instead. Refer to the code snapshot below.

### Build and Run with oneAPI DPC++ Compiler 
For this  AlexNet neural network topology for forward-pass inference sample on GPU, DPC++ is used as the compiler.
The following section guides you how to build with DPC++ and run on GPU.

#### Script - build.sh
The script **build.sh** encapsulates the compiler  command and flags that will generate the exectuable.

In [None]:
%%writefile build.sh
#!/bin/bash
source $ONEAPI_INSTALL/setvars.sh  --dnnl-configuration=cpu_dpcpp_gpu_dpcpp --force > /dev/null 2>&1
export EXAMPLE_ROOT=./lab/
mkdir dpcpp
cd dpcpp
cmake .. -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=dpcpp -DDNNL_CPU_RUNTIME=SYCL -DDNNL_GPU_RUNTIME=SYCL
make cnn-inference-f32-cpp



Once you achieve an all-clear from your compilation, you execute your program on the DevCloud or in local environments.

#### Script - run.sh
the script **run.sh** encapsulates the program for submission to the job queue for execution.

By default, oneDNN Verbose log is disabled.
You can unmark  #export DNNL_VERBOSE=1 to enable oneDNN verbose log.

In [None]:
%%writefile run.sh
#!/bin/bash
source $ONEAPI_INSTALL/setvars.sh  --dnnl-configuration=cpu_dpcpp_gpu_dpcpp --force > /dev/null 2>&1
echo "########## Executing the run"
#export DNNL_VERBOSE=1
./dpcpp/out/cnn-inference-f32-cpp gpu
echo "########## Done with the run"



#### Submitting **build.sh** and **run.sh** to the job queue
Now we can submit the **build.sh** and **run.sh** to the job queue.

##### NOTE - it is possible to execute any of the build and run commands in local environments.
To enable users to run their scripts both on the DevCloud or in local environments, this and subsequent training checks for the existence of the job submission command **qsub**.  If the check fails it is assumed that build/run will be local.

In [None]:
!rm -rf dpcpp; chmod 755 q; chmod 755 build.sh; chmod 755 run.sh;if [ -x "$(command -v qsub)" ]; then ./q build.sh; ./q run.sh; else ./build.sh; ./run.sh; fi

#### Enable oneDNN Verbose log and check the engine kind for each operation
gpu should be the engine kind for most of the operations, and you should be able to check the engine kind after "dnnl_verbose,exec," for each operation.
Check this [link](https://oneapi-src.github.io/oneDNN/dev_guide_verbose.html) for a detailed explanation of oneDNN verbose log.

Below is an example for oneDNN verbose log for convolution on GPU:

dnnl_verbose,exec,gpu,convolution,ocl:gen9:blocked,forward_inference,src_f32::blocked:abcd:f0 wei_f32::blocked:Acdb16a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd16b:f0,,alg:convolution_direct,mb1_ic3oc96_ih227oh55kh11sh4dh0ph0_iw227ow55kw11sw4dw0pw0

***
# Summary
In this lab, the developer learned the following:
* How to port a oneDNN sample from CPU-only version to CPU&GPU version
* How to program a simple convolutional neural network by using oneDNN