# Intel® Open VKL - GPU

## Module Overview

This module gets new users started with Intel® Open VKL. It demonstrates how to initialize Open VKL, how to create a volume and change the volume type, how to render the volume and how to create ISO surfaces targeting the GPU.

## Learning Objectives

* Learn the overview of Intel® Open VKL.
* Learn the architecture that Open VKL can target.
* Learn how to initialize an Intel® Open VKL device.
* Learn how to setup the geometry buffer with triangle vertices and commit the resulting "scene" to the device.
* Learn how to execute the casting of rays into the scene and report hits or misses.
* Learn how to release the scene and device resources.
* Learn how to target GPU architecture with Intel® Open VKL.


***

### 1. Intel&reg; Open VKL Overview

Intel® Open Volume Kernel Library (Intel® Open VKL) is a collection of high-performance volume computation kernels, developed at Intel. The target users of Open VKL are graphics application engineers who want to improve the performance of their volume rendering applications by leveraging Open VKL’s performance-optimized kernels, which include volume traversal and sampling functionality for a variety of volumetric data formats. Open VKL supports x86 CPUs under Linux, macOS, and Windows; ARM CPUs on macOS; as well as Intel® GPUs under Linux and Windows (currently in beta). Open VKL is part of the Intel® oneAPI Rendering Toolkit and is released under the permissive Apache 2.0 license.



***

### 2. Supported Platforms by Open VKL

Open VKL contains kernels optimized for the latest x86 processors with support for SSE, AVX, AVX2, and AVX-512 instructions, and for ARM processors with support for NEON instructions. Open VKL supports Intel GPUs based on the Xe HPG microarchitecture (Intel® Arc™ GPU) under Linux and Windows and Xe HPC microarchitecture (Intel® Data Center GPU Flex Series and Intel® Data Center GPU Max Series) under Linux. Intel GPU support leverages the SYCL open standard programming language; SYCL allows one to write C++ code that can be run on various devices, such as CPUs and GPUs.

***

### 3. Open VKL Features

Open VKL provides a C-based API on CPU and GPU, and also supports applications written with the Intel® Implicit SPMD Program Compiler (Intel® ISPC) for CPU by also providing an ISPC interface to the core volume algorithms. This makes it possible to write a renderer in ISPC that automatically vectorizes and leverages SSE, AVX, AVX2, AVX-512, and NEON instructions. ISPC also supports runtime code selection, thus ISPC will select the best code path for your application.

In addition to the volume kernels, Open VKL provides tutorials and example renderers to demonstrate how to best use the Open VKL API.


Open VKL provides data structures that store volumetric fields. It also provides algorithms:
- Point sampling
- Interval iterators
- Hit iterators

Open VKL is developed to provide high performance in scientific visualization & production rendering.

***

### 4. Intel® Open Volume Kernel Library: Minimal Examples

This notebook contains a sequence of minimal code examples that make use of Open VKL. These examples are designed to be read and understood in sequence; each example builds upon the previous one.

The examples provided are:

- minimal_01.cpp: prerequisite code infrastructure for managing a framebuffer, using a transfer function, and drawing the frame buffer to the terminal.
-  minimal_02.cpp: initializing Open VKL
- minimal_03.cpp: instantiating a VKL volume, sampler, and rendering a slice
- minimal_04.cpp: changing volume types
- minimal_05.cpp: creating a ray marching volume renderer
- minimal_06.cpp: creating an isosurface renderer
For more complex examples, see the vklExamples application and corresponding code.

***

#### 4.1 minimal_01.cpp

Prerequisite code infrastructure for managing a framebuffer, using a transfer function, and drawing the frame buffer to the terminal.


In [1]:
%%writefile src/minimal_01.cpp

// Copyright 2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#ifdef USE_GPU
#include "gpu.h"
#endif

#include "framebuffer.h"

int main(int argc, char **argv)
{
#ifdef USE_GPU
  // on GPU we need to create a SYCL queue.
  sycl::queue syclQueue = initSyclQueue();
#endif

#ifdef USE_GPU
  // on GPU, we provide the SYCL queue to facilitate GPU memory allocations.
  Framebuffer<AllocatorSycl<Pixel>> fb(64, 32, syclQueue);
#else
  Framebuffer<> fb(64, 32);
#endif

  fb.generate([=](float fx, float fy) { return transferFunction(2 * fx - 1); });
  fb.drawToTerminal();

  return 0;
}

Overwriting src/minimal_01.cpp


***

#### 4.2 minimal_02.cpp

First steps to initialize Open VKL.


In [2]:
%%writefile src/minimal_02.cpp

// Copyright 2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#ifdef USE_GPU
#include "gpu.h"
#endif

#include "framebuffer.h"

// We must include the openvkl header.
#include <openvkl/openvkl.h>

#include <openvkl/device/openvkl.h>

int main(int argc, char **argv)
{
#ifdef USE_GPU
  sycl::queue syclQueue = initSyclQueue();
#endif

  // To initialize Open VKL, load the device module, which is essentially the
  // backend implementation. Our current release supports a "cpu" device
  // which is highly optimized for vector CPU architectures, and a "gpu" device
  // optimized for GPUs.
  vklInit();

#ifndef USE_GPU
  // The device itself will be manage all resources. cpu selects the native
  // vector width for best performance.
  VKLDevice device = vklNewDevice("cpu");
#else
  // For GPU, we need to provide a SYCL context.
  VKLDevice device          = vklNewDevice("gpu");
  sycl::context syclContext = syclQueue.get_context();

  vklDeviceSetVoidPtr(device, "syclContext", static_cast<void *>(&syclContext));
#endif

  // Devices must be committed before use. This is because they support
  // parameters, such as logging verbosity.
  vklCommitDevice(device);

#ifdef USE_GPU
  // debug: see if this resolves link errors in GPU device
  VKLVolume volume = vklNewVolume(device, "structuredRegular");
  vklCommit(volume);
#endif

#ifdef USE_GPU
  Framebuffer<AllocatorSycl<Pixel>> fb(64, 32, syclQueue);
#else
  Framebuffer<> fb(64, 32);
#endif

  fb.generate([=](float fx, float fy) { return transferFunction(2 * fx - 1); });
  fb.drawToTerminal();

  // When the application is done with the device, release it!
  // This will clean up the internal state.
  vklReleaseDevice(device);

  return 0;
}


Overwriting src/minimal_02.cpp


***

#### 4.3 minimal_03.cpp

Instantiating a VKL volume, sampler, and rendering a slice.


In [3]:
%%writefile src/minimal_03.cpp

// Copyright 2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#ifdef USE_GPU
#include "gpu.h"
#endif

#include "create_voxels.h"
#include "framebuffer.h"

// We must include the openvkl header.
#include <openvkl/openvkl.h>

#include <openvkl/device/openvkl.h>

int main(int argc, char **argv)
{
#ifdef USE_GPU
  sycl::queue syclQueue = initSyclQueue();
#endif

  vklInit();

#ifndef USE_GPU
  VKLDevice device = vklNewDevice("cpu");
#else
  VKLDevice device          = vklNewDevice("gpu");
  sycl::context syclContext = syclQueue.get_context();

  vklDeviceSetVoidPtr(device, "syclContext", static_cast<void *>(&syclContext));
#endif

  vklCommitDevice(device);

  // "Load data from disk". (We generate the array procedurally).
  constexpr size_t res      = 128;
  std::vector<float> voxels = createVoxels(res);

  // Note that Open VKL uses a C99 API for maximum compatibility.
  // So we will have to wrap the array we just created so that
  // we can pass it to Open VKL.

  // Create a new volume. Volume objects are created on a device.
  // We create a structured regular grid here, which is essentially
  // a dense 3D array.
  VKLVolume volume = vklNewVolume(device, "structuredRegular");

  // We have to set a few parameters on the volume.
  // First, Open VKL needs to know the extent of the volume:
  vklSetVec3i(volume, "dimensions", res, res, res);

  // By default, the volume assumes a voxel size of 1. Scale it so the
  // domain is [0, 1].
  const float spacing = 1.f / static_cast<float>(res);
  vklSetVec3f(volume, "gridSpacing", spacing, spacing, spacing);

  // Open VKL has a concept of typed Data objects. That's how we pass data
  // buffers to a device.
  VKLData voxelData =
      vklNewData(device, voxels.size(), VKL_FLOAT, voxels.data());

  // Set the data parameter. We can release the data directly afterwards
  // as Open VKL has a reference counting mechanism and will keep track
  // internally.
  vklSetData(volume, "data", voxelData);
  vklRelease(voxelData);

  // Finally, commit. This may build acceleration structures, etc.
  vklCommit(volume);

  // Instead of drawing the field directly into our framebuffer, we will instead
  // sample the volume we just created. To do that, we need a sampler object.
  VKLSampler sampler = vklNewSampler(volume);
  vklCommit(sampler);

#ifdef USE_GPU
  Framebuffer<AllocatorSycl<Pixel>> fb(64, 32, syclQueue);
#else
  Framebuffer<> fb(64, 32);
#endif

  fb.generate([=](float fx, float fy) {
    // To sample, we call vklComputeSample on our sampler object.
    const vkl_vec3f p = {fx, fy, 0.f};
    return transferFunction(vklComputeSample(&sampler, &p));
  });

  fb.drawToTerminal();

  // Release the volume to clean up!
  vklRelease(sampler);
  vklRelease(volume);
  vklReleaseDevice(device);

  return 0;
}


Overwriting src/minimal_03.cpp


*** 

#### 4.4 minimal_04.cpp

Changing volume types to an spherical one.


In [4]:
%%writefile src/minimal_04.cpp

// Copyright 2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#ifdef USE_GPU
#include "gpu.h"
#endif

#include "create_voxels.h"
#include "framebuffer.h"

#include <openvkl/openvkl.h>

#include <openvkl/device/openvkl.h>

int main(int argc, char **argv)
{
#ifdef USE_GPU
  sycl::queue syclQueue = initSyclQueue();
#endif

  vklInit();

#ifndef USE_GPU
  VKLDevice device = vklNewDevice("cpu");
#else
  VKLDevice device          = vklNewDevice("gpu");
  sycl::context syclContext = syclQueue.get_context();

  vklDeviceSetVoidPtr(device, "syclContext", static_cast<void *>(&syclContext));
#endif

  vklCommitDevice(device);

  constexpr size_t res      = 128;
  std::vector<float> voxels = createVoxels(res);

  // One advantage of Open VKL is that we can use a different data structure
  // with the same sampling API.
  // Here, we replace our data structure with a structured spherical volume
  // for a spherical domain.
  VKLVolume volume = vklNewVolume(device, "structuredSpherical");

  vklSetVec3i(volume, "dimensions", res, res, res);
  const float spacing = 1.f / static_cast<float>(res);
  // We must adapt gridSpacing, as structuredSpherical expects spacing
  // in spherical coordinates.
  vklSetVec3f(volume, "gridSpacing", spacing, 180.f * spacing, 360.f * spacing);

  VKLData voxelData =
      vklNewData(device, voxels.size(), VKL_FLOAT, voxels.data());
  vklSetData(volume, "data", voxelData);
  vklRelease(voxelData);

  vklCommit(volume);

  VKLSampler sampler = vklNewSampler(volume);
  vklCommit(sampler);

#ifdef USE_GPU
  Framebuffer<AllocatorSycl<Pixel>> fb(64, 32, syclQueue);
#else
  Framebuffer<> fb(64, 32);
#endif

  fb.generate([=](float fx, float fy) {
    // Also try slice 1.0 to demonstrate a different view.
    const vkl_vec3f p = {fx, fy, 0.f};
    return transferFunction(vklComputeSample(&sampler, &p));
  });

  fb.drawToTerminal();

  vklRelease(sampler);
  vklRelease(volume);
  vklReleaseDevice(device);

  return 0;
}


Overwriting src/minimal_04.cpp


***

#### 4.5 minimal_05.cpp

Creating a ray marching volume renderer.


In [5]:
%%writefile src/minimal_05.cpp

// Copyright 2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#ifdef USE_GPU
#include "gpu.h"
#endif

#include "create_voxels.h"
#include "framebuffer.h"

#include <openvkl/openvkl.h>

#include <openvkl/device/openvkl.h>

int main(int argc, char **argv)
{
#ifdef USE_GPU
  sycl::queue syclQueue = initSyclQueue();
#endif

  vklInit();

#ifndef USE_GPU
  VKLDevice device = vklNewDevice("cpu");
#else
  VKLDevice device          = vklNewDevice("gpu");
  sycl::context syclContext = syclQueue.get_context();

  vklDeviceSetVoidPtr(device, "syclContext", static_cast<void *>(&syclContext));
#endif

  vklCommitDevice(device);

  constexpr size_t res      = 128;
  std::vector<float> voxels = createVoxels(res);

  VKLVolume volume = vklNewVolume(device, "structuredRegular");
  vklSetVec3i(volume, "dimensions", res, res, res);

  const float spacing = 1.f / static_cast<float>(res);
  vklSetVec3f(volume, "gridSpacing", spacing, spacing, spacing);
  VKLData voxelData =
      vklNewData(device, voxels.size(), VKL_FLOAT, voxels.data());
  vklSetData(volume, "data", voxelData);
  vklRelease(voxelData);

  vklCommit(volume);

  VKLSampler sampler = vklNewSampler(volume);
  vklCommit(sampler);

#ifdef USE_GPU
  Framebuffer<AllocatorSycl<Pixel>> fb(64, 32, syclQueue);
#else
  Framebuffer<> fb(64, 32);
#endif

  // We trace the volume with simple ray marching.
  // Conceptually, this is a series of camera-aligned,
  // semi transparent planes.
  // We walk along the ray in regular steps.
  const int numSteps = 8;
  const float tMax   = 1.f;
  const float tStep  = tMax / numSteps;
  fb.generate([=](float fx, float fy) {
    Color color = {0.f};
    for (int i = 0; i < numSteps; ++i) {
      const vkl_vec3f p = {fx, fy, i * tStep};
      const Color c     = transferFunction(vklComputeSample(&sampler, &p));

      // We use the over operator to blend semi-transparent
      // "surfaces" together.
      color = over(color, c);

      // Now we've created a very simple volume renderer using
      // Open VKL!
    }
    return color;
  });

  fb.drawToTerminal();

  vklRelease(sampler);
  vklRelease(volume);
  vklReleaseDevice(device);

  return 0;
}

Overwriting src/minimal_05.cpp


***

#### 4.6 minimal_06.cpp

How to create ISO surfaces in the volume data.

In [6]:
%%writefile src/minimal_06.cpp

// Copyright 2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#ifdef USE_GPU
#include "gpu.h"
#endif

#include "create_voxels.h"
#include "framebuffer.h"

#include <openvkl/openvkl.h>

#include <openvkl/device/openvkl.h>

#if defined(_MSC_VER)
#include <malloc.h>
#else
#include <alloca.h>
#endif

int main(int argc, char **argv)
{
#ifdef USE_GPU
  sycl::queue syclQueue = initSyclQueue();
#endif

  vklInit();

#ifndef USE_GPU
  VKLDevice device = vklNewDevice("cpu");
#else
  VKLDevice device          = vklNewDevice("gpu");
  sycl::context syclContext = syclQueue.get_context();

  vklDeviceSetVoidPtr(device, "syclContext", static_cast<void *>(&syclContext));
#endif

  vklCommitDevice(device);
  constexpr size_t res      = 128;
  std::vector<float> voxels = createVoxels(res);

  VKLVolume volume = vklNewVolume(device, "structuredRegular");
  vklSetVec3i(volume, "dimensions", res, res, res);
  const float spacing = 1.f / static_cast<float>(res);
  vklSetVec3f(volume, "gridSpacing", spacing, spacing, spacing);
  VKLData voxelData =
      vklNewData(device, voxels.size(), VKL_FLOAT, voxels.data());
  vklSetData(volume, "data", voxelData);
  vklRelease(voxelData);
  vklCommit(volume);

  VKLSampler sampler = vklNewSampler(volume);
  vklCommit(sampler);

  const float isovalues[]       = {-.6f, -.1f, .4f, .9f};
  VKLHitIteratorContext context = vklNewHitIteratorContext(sampler);
  VKLData isovaluesData         = vklNewData(device, 4, VKL_FLOAT, isovalues);
  vklSetData(context, "values", isovaluesData);
  vklRelease(isovaluesData);
  vklCommit(context);

#ifdef USE_GPU
  Framebuffer<AllocatorSycl<Pixel>> fb(64, 32, syclQueue);
#else
  Framebuffer<> fb(64, 32);
#endif
  // We will create iterators below, and we will need to know how much memory
  // to allocate.
  const size_t iteratorSize = vklGetHitIteratorSize(&context);

#ifdef USE_GPU
  char *buffer = sycl::malloc_device<char>(iteratorSize, syclQueue);
#endif

  fb.generate([=](float fx, float fy) {
    // Set up the ray, as iterators work on rays.
    const vkl_vec3f rayOrigin    = {fx, fy, 0.f};
    const vkl_vec3f rayDirection = {0.f, 0.f, 1.f};
    const vkl_range1f rayTRange  = {0.f, 1.f};

// Create a buffer for the iterator.
#ifndef USE_GPU
#if defined(_MSC_VER)
    char *buffer = static_cast<char *>(_malloca(iteratorSize));
#else
    char *buffer = static_cast<char *>(alloca(iteratorSize));
#endif
#endif

    // Initialize iterator into the buffer we just created.
    VKLHitIterator hitIterator = vklInitHitIterator(
        &context, &rayOrigin, &rayDirection, &rayTRange, 0.f, buffer);

    // Loop over all ray-isosurface intersections along our ray.
    // vklIterateHit will return false when there
    // is no more hit left.
    VKLHit hit;
    Color color = {0.f};
    while (vklIterateHit(hitIterator, &hit)) {
      const Color c = transferFunction(hit.sample);
      color         = over(color, c);
    }
    return color;
  });

  fb.drawToTerminal();

#ifdef USE_GPU
  sycl::free(buffer, syclQueue);
#endif

  vklRelease(context);
  vklRelease(sampler);
  vklRelease(volume);
  vklReleaseDevice(device);

  return 0;
}


Overwriting src/minimal_06.cpp


***

### 5. Build 

Depending on the architecture to target with the program, there are two different build approaches:

For CPU:

```sh
  #!/bin/bash
  source /opt/intel/oneapi/setvars.sh
  rm -r build_CPU
  mkdir build_CPU
  cd build_CPU
  cmake ../script_CPU
  cmake --build . --verbose
 ```

For GPU:

```sh
  #!/bin/bash
  source /opt/intel/oneapi/setvars.sh
  rm -r build_GPU
  mkdir build_GPU
  cd build_GPU
  cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=icpx ../script_GPU
  cmake --build . --verbose
```

So to run it on the GPU, the program has be compiled by running the `build_GPU.sh` script in the cell below.


In [7]:
! ./build_GPU.sh

 
   To force a re-execution of setvars.sh, use the '--force' option.
   Using '--force' can result in excessive use of your environment variables.
  
usage: source setvars.sh [--force] [--config=file] [--help] [...]
  --force        Force setvars.sh to re-run, doing so may overload environment.
  --config=file  Customize env vars using a setvars.sh configuration file.
  --help         Display this help message and exit.
  ...            Additional args are passed to individual env/vars.sh scripts
                 and should follow this script's arguments.
  
  Some POSIX shells do not accept command-line options. In that case, you can pass
  command-line options via the SETVARS_ARGS environment variable. For example:
  
  $ SETVARS_ARGS="ia32 --config=config.txt" ; export SETVARS_ARGS
  $ . path/to/setvars.sh
  
  The SETVARS_ARGS environment variable is cleared on exiting setvars.sh.
  
rm: cannot remove 'build_GPU': No such file or directory
-- The CXX compiler identification is Int

***

### 6. Run

Execute the `./run_GPU.sh` script below to run the program.

In [None]:
! ./run_GPU.sh

## ue67fcfb20b4827a1ed6842c10ec58c1 is running vklMinimal_GPU_01 through 06


vklMinimal_GPU_01
Target SYCL device: Intel(R) Data Center GPU Max 1100

[48;2;233;97;21m [m[48;2;225;93;20m [m[48;2;218;90;19m [m[48;2;211;87;19m [m[48;2;203;84;18m [m[48;2;196;81;17m [m[48;2;189;78;17m [m[48;2;182;75;16m [m[48;2;174;72;15m [m[48;2;167;69;15m [m[48;2;160;66;14m [m[48;2;152;63;13m [m[48;2;145;60;13m [m[48;2;138;57;12m [m[48;2;131;54;11m [m[48;2;123;51;11m [m[48;2;116;48;10m [m[48;2;109;45;9m [m[48;2;101;42;9m [m[48;2;94;39;8m [m[48;2;87;36;7m [m[48;2;80;33;7m [m[48;2;72;30;6m [m[48;2;65;27;5m [m[48;2;58;24;5m [m[48;2;50;21;4m [m[48;2;43;18;3m [m[48;2;36;15;3m [m[48;2;29;12;2m [m[48;2;21;9;1m [m[48;2;14;6;1m [m[48;2;7;3;0m [m[48;2;0;0;0m [m[48;2;0;5;7m [m[48;2;0;10;15m [m[48;2;0;15;23m [m[48;2;0;20;30m [m[48;2;0;25;38m [m[48;2;0;30;46m [m[48;2;0;35;53m [m[48;2;0;40;61m [m[48;2;0;45;69m [m[48;2;0;50;76m [m[48

***

## Summary
In this module you learned:

* The overview of Intel® Open VKL.
* The architecture that Intel® Open VKL supports. 
* How to use a basic application to implement Open VKL into.
* The first steps to call Open VKL.
* How to create an Open VKL volume and make slices of it.
* How to select a spherical volume.
* How to create volume render.
* How to create ISO surfaces in the volume data.
* How to target GPU architecture wih Intel® Open VKL minimal tutorial.

## Resources
* [https://www.openvkl.org/](https://www.openvkl.org/)
* [github.com/openvkl](https://github.com/openvkl/openvkl/tree/master)
* [Intel Rendering Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/rendering-toolkit.html)


***