# Intel oneAPI MKL Training

##### Sections
* [Learning Objectives](#Learning-Objectives)
* [oneMKL Overview](#oneMKL-Overview)
* [Prerequisites](#Prerequisites)
* [oneMKL With DPC++](#oneMKL-With-DPC++)
* [oneMKL With OpenMP Offload](#oneMKL-With-OpenMP-Offload)
* [Modules](#Modules)

## Learning Objectives
* Understand how the __Intel oneAPI Math Kernel Library (oneMKL)__ fits within the __oneAPI programming model__ for heterogeneous computing
* Know the difference between the __Data Parallel C++ (DPC++)__ and __OpenMP Offload__ approaches to oneMKL and when to use each one
* Get __hands-on__ experience with common oneMKL routines

## oneMKL Overview
oneMKL provides a set of optimized scientific computing routines familiar to users of the Intel Math Kernel Library (MKL). This toolkit extends heterogeneous computing functionality via the DPC++ and OpenMP Offload interfaces.

Each interface follows a specific use case. Generally, users creating new data-parallel projects *or* migrating CUDA or OpenCL projects should opt for DPC++, while those updating legacy C or Fortran code should use OpenMP Offload.

## Prerequisites
The following courses prime the reader on the use of oneAPI with DPC++ and with OpenMP Offload. The courses also provide motivation for the use of each method and serve as a foundation for the learnings in this lab.

* [Essentials of Data Parallel C++](https://software.intel.com/content/www/us/en/develop/tools/oneapi/training/dpc-essentials.html)
* [OpenMP* Offload Basics](https://software.intel.com/content/www/us/en/develop/tools/oneapi/training/openmp-offload.html)

oneMKL simplifies the use of the oneAPI programming model and handles much of the work for users. As such it is *not* necessary to work through all of the training modules in the *Essentials of Data Parallel C++* lab. Below is the list of minimum recommended training modules for DPC++ before starting this lab.

* oneAPI_Essentials/01_oneAPI_Intro
* oneAPI_Essentials/02_DPCPP_Program_structure
* oneAPI_Essentials/03_DPCPP_Unified_Shared_Memory

As for the OpenMP Offload approach, it will be worthwhile to view all training modules in the OpenMP Offload Basics lab.

## oneMKL With DPC++
The oneMKL DPC++ interface allows DPC++ programs to take advantage of oneMKL routines. When working with the DPC++ we must keep track of a few important components, including:

* __Device(s)__ on which oneMKL functions will execute
* __Queue__ to schedule submission of tasks to device(s)

oneMKL also supports different DPC++ memory management models.

1. __Buffers__ and __accessors__
2. __Unified shared memory__

A typical DPC++ program requires the user to create a __kernel__, contained within a __command group__. The user must then submit the __command group__ to the __queue__, scheduling its execution on the given __device__. 

oneMKL provides a simpler path. Instead of the traditional approach, the user need only create a __queue__ and pass it to a oneMKL function call. The function selects a pre-written kernel, optimized for the chosen device, and submits it to our queue. There is *no* need to write a __kernel__ or __command group__.

## oneMKL With OpenMP Offload
The OpenMP Offload approach interfaces well with existing C code, allowing programs to execute on GPUs with __minimal__ changes to the source. OpenMP Offload utilizes __directives__ in the form of ```#pragma``` statements. The *OpenMP Offload Basics* lab linked above explores these directives in greater details. The following modules will explain how to target the OpenMP Offload interface for oneMKL, and how to set up the necessary ```#pragma``` statements for each routine.

## Modules
Each module is a self contained lab explaining the usage of a specific oneMKL routine. Further, each module shows the usage of a given operation under three different paradigms:
1. DPC++ with buffer/accessor memory model
2. DPC++ with unified shared memory model
3. OpenMP Offload

### 00 - [Matrix Multiplication (GEMM)](./00_GEMM/00_GEMM.ipynb)

## Summary

As is shown in the above modules, oneMKL enables users by providing an easy way to utilize heterogeneous computing platforms. Whether it be DPC++ for new applications or OpenMP Offload for legacy code, oneMKL provides a means to accelerate scientific computing workloads.

Hopefully, you can now:

* Understand the use of oneMKL within the oneAPI framework
* Utilize DPC++ to take advantage of heterogeneous computing systems
* Execute oneMKL routines on a GPU with OpenMP Offload

<html><body><span style="color:green"><h1>Survey</h1></span></body></html>

[We would appreciate any feedback you’d care to give, so that we can improve the overall training quality and experience. Thanks! ](https://intel.az1.qualtrics.com/jfe/form/SV_3elZDqbEP3ZcXC5)