# oneDPL image processing example

##### Sections
- _Code_: [Gamma Correction with TBB](#Gamma-correction)
- _Code_: [Gamma Correction on target device](#Gamma-correction-on-target-device)

# Gamma correction

Following exercise consists of two parts. First, we demonstrate how to use Parallel STL with TBB on a classic image processing example. Second, we use a special execution policy to target a device and offload computation by using STL's algorithm `for_each`.

## Gamma correction method

Let's take a look at a classic image processing example. Gamma correction, or short gamma is a non-linear operation used to encode and decode the luminance of each pixel. We basically iterate over each pixel of an image. Our filter converts each pixel to grayscale, and then applies the gamma correction. This is done by computing the following function to our single resulting color value `pow(value,gamma)`. For an image of `height` times `width` pixels, the operation is applied to each pixel RGBA. Fractal images with different gamma correction values look as follows. These can be regenerated by modifying our application example.

Fractal with 0.5 gamma:

![Processed fractal image](img/fractal_gamma_dark.png)

Original input image:

![Fractal image](img/fractal_original.png)

Fractal with 2.0 gamma:

![Processed fractal image](img/fractal_gamma_bright.png)

More information on gamma correction can be found at:

https://en.wikipedia.org/wiki/Gamma_correction

Your task is to complete the application template below and use Parallel STL to enable the image processing in parallel.


In [8]:
%%writefile lab/gamma-correction.cpp
//==============================================================
// Copyright (c) 2020 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
// =============================================================

#include <iomanip>
#include <iostream>

#include <chrono>

#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>

#include "utils.hpp"

int main() {
  // Image size is width x height
  const int width = 450;
  const int height = 450;

  Img<ImgFormat::BMP> image{width, height};
  ImgFractal fractal{width, height};

  // Lambda to process pixel with gamma correction
  auto gamma_f = [](ImgPixel& pixel) {
    /* compute correction factor */
    const double gamma = 0.7;
    double gamma_correction = 1.0 / gamma;
    /* convert to graysale */
    double v = (0.3 * pixel.r + 0.59 * pixel.g + 0.11 * pixel.b) / 255.0 ;
    /* apply gamma correction */
    double res = 255.0 * pow(v, gamma_correction);
    /* saturated cast for out of bound check */
    auto gamma_pixel = static_cast<std::uint8_t>(res);
    if (gamma_pixel > UINT8_MAX) gamma_pixel = UINT8_MAX;
    pixel.set(gamma_pixel, gamma_pixel, gamma_pixel, gamma_pixel);
  };

  // fill image with created fractal
  int index = 0;
  image.fill([&index, width, &fractal](ImgPixel& pixel) {
    int x = index % width;
    int y = index / width;

    auto fractal_pixel = fractal(x, y);
    if (fractal_pixel < 0) fractal_pixel = 0;
    if (fractal_pixel > 255) fractal_pixel = 255;
    pixel.set(fractal_pixel, fractal_pixel, fractal_pixel, fractal_pixel);

    ++index;
  });

  Img<ImgFormat::BMP> image2 = image;
  image.write("fractal_original.png");

  // call standard serial function for correctness check
    
  image.fill(gamma_f);
  image.write("fractal_gamma.png");

  // Image processing with Parallel STL
  // *** Step 1: Insert STL algorithm with parallel execution policy here

  // check correctness
  if (check(image.begin(), image.end(), image2.begin())) {
    std::cout << "success\n";
  } else {
    std::cout << "fail\n";
  }

  image2.write("fractal_gamma_pstl.png");

  return 0;
}

Overwriting lab/gamma-correction.cpp


# Solution:

In [2]:
%%writefile lab/gamma-correction.cpp
//==============================================================
// Copyright (c) 2020 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
// =============================================================

#include <iomanip>
#include <iostream>

#include <chrono>

#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>

#include "utils.hpp"

int main() {
  // Image size is width x height
  const int width = 450;
  const int height = 450;

  Img<ImgFormat::BMP> image{width, height};
  ImgFractal fractal{width, height};

  // Lambda to process pixel with gamma correction
  auto gamma_f = [](ImgPixel& pixel) {
    /* compute correction factor */
    const double gamma = 0.7;
    double gamma_correction = 1.0 / gamma;
    /* convert to graysale */
    double v = (0.3 * pixel.r + 0.59 * pixel.g + 0.11 * pixel.b) / 255.0 ;
    /* apply gamma correction */
    double res = 255.0 * pow(v, gamma_correction);
    /* saturated cast for out of bound check */
    auto gamma_pixel = static_cast<std::uint8_t>(res);
    if (gamma_pixel > UINT8_MAX) gamma_pixel = UINT8_MAX;
    pixel.set(gamma_pixel, gamma_pixel, gamma_pixel, gamma_pixel);
  };

  // fill image with created fractal
  int index = 0;
  image.fill([&index, width, &fractal](ImgPixel& pixel) {
    int x = index % width;
    int y = index / width;

    auto fractal_pixel = fractal(x, y);
    if (fractal_pixel < 0) fractal_pixel = 0;
    if (fractal_pixel > 255) fractal_pixel = 255;
    pixel.set(fractal_pixel, fractal_pixel, fractal_pixel, fractal_pixel);

    ++index;
  });

  Img<ImgFormat::BMP> image2 = image;
  image.write("fractal_original.png");

  // call standard serial function for correctness check
    
  image.fill(gamma_f);
  image.write("fractal_gamma.png");

  // Image processing with Parallel STL
  std::for_each(std::execution::par, image2.begin(), image2.end(), gamma_f);

  // check correctness
  if (check(image.begin(), image.end(), image2.begin())) {
    std::cout << "success\n";
  } else {
    std::cout << "fail\n";
  }

  image2.write("fractal_gamma_pstl.png");

  return 0;
}

Overwriting lab/gamma-correction.cpp


In [5]:
! chmod 755 q; chmod 755 run_gamma_correction.sh; if [ -x "$(command -v qsub)" ]; then ./q run_gamma_correction.sh; else ./run_gamma_correction.sh; fi

Job has been submitted to Intel(R) DevCloud and will execute soon.

 If you do not see result in 60 seconds, please restart the Jupyter kernel:
 Kernel -> 'Restart Kernel and Clear All Outputs...' and then try again

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
688791.v-qsvr-1            ...ub-singleuser u48845          00:01:11 R jupyterhub     
688820.v-qsvr-1            ...correction.sh u48845                 0 Q batch          

Waiting for Output ████████████████████ Done⬇

########################################################################
#      Date:           Wed Sep 16 19:30:59 PDT 2020
#    Job ID:           688820.v-qsvr-1.aidevcloud
#      User:           u48845
# Resources:           neednodes=1:ppn=2,nodes=1:ppn=2,walltime=06:00:00
########################################################################

## u48845 is compiling STL Introduction sample sort
succe

# Gamma correction on target device

This is your bonus exercise, it runs out of the box.
We already did the hard work and modified the previous image processing example so that it can run on a target device.

oneDPL as part of Intel oneAPI provides custom execution policies along with utility functions. The execution on a target device is controlled by a custom device execution policy. To support interoperability with SYCL's buffer concept the utility functions `begin` and `end` are used to support an iterator-style interface such as STL algorithm. We use SYCL's buffer concept for the memory abstraction in the following example. An alternative approach would be USM. 

In [6]:
%%writefile lab/gamma-correction-device.cpp
//==============================================================
// Copyright (c) 2020 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
// =============================================================

#include <iomanip>
#include <iostream>

#include <CL/sycl.hpp>

#include <chrono>

#include <oneapi/dpl/iterator>
#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>

#include "utils.hpp"

using namespace cl::sycl;
using namespace oneapi::dpl::execution;

int main() {
  // Image size is width x height
  int width = 720;
  int height = 480;

  Img<ImgFormat::BMP> image{width, height};
  ImgFractal fractal{width, height};

  // Lambda to process image with gamma = 0.5
  auto gamma_f = [](ImgPixel& pixel) {
    float v = (0.3f * pixel.r + 0.59f * pixel.g + 0.11f * pixel.b) / 255.0f;

    auto gamma_pixel = static_cast<std::uint8_t>(255.0f * v * v);
    if (gamma_pixel > 255) gamma_pixel = 255;
    pixel.set(gamma_pixel, gamma_pixel, gamma_pixel, gamma_pixel);
  };

  // fill image with created fractal
  int index = 0;
  image.fill([&index, width, &fractal](ImgPixel& pixel) {
    int x = index % width;
    int y = index / width;

    auto fractal_pixel = fractal(x, y);
    if (fractal_pixel < 0) fractal_pixel = 0;
    if (fractal_pixel > 255) fractal_pixel = 255;
    pixel.set(fractal_pixel, fractal_pixel, fractal_pixel, fractal_pixel);

    ++index;
  });

  Img<ImgFormat::BMP> image2 = image;
  image.write("fractal_original.png");

  // call standard serial function for correctness check
    image.fill(gamma_f);  
    image.write("fractal_gamma.png");

  // create a queue for tasks, sent to the device
  //  Select either the gpu_selector or the cpu_selector or the default_selector
  //queue q(gpu_selector{});
  //queue q(cpu_selector{});
  queue q(default_selector{});

  // We need a new scope to control the buffer's destruction and copy back the data
  {
    // ****Step 1: Creating a SYCL buffer for moving data on target device
    buffer<ImgPixel, 1> buffer(image2.data(),image2.width() * image2.height());

    // ****Step 2: Create buffer iterators. These are passed to the algorithm
    auto buffer_begin = oneapi::dpl::begin(buffer);
    auto buffer_end = oneapi::dpl::end(buffer);

    //*****Step 3: Create a new device policy
    auto new_policy = make_device_policy(q);
    //*****Step 4: Call std::for_each with DPC++ support    
    std::for_each(new_policy, buffer_begin, buffer_end, gamma_f);   
  }

  // check correctness
  if (check(image.begin(), image.end(), image2.begin())) {
    std::cout << "success";
  } else {
    std::cout << "fail";
  }
  std::cout << ". Run on "
            << q.get_device().get_info<cl::sycl::info::device::name>()
            << std::endl;

  image2.write("fractal_gamma_pstl_with_sycl.png");

  return 0;
}

Overwriting lab/gamma-correction-device.cpp


In [7]:
! chmod 755 q-gpu; chmod 755 run_gamma_correction_on_device.sh; if [ -x "$(command -v qsub)" ]; then ./q-gpu run_gamma_correction_on_device.sh; else ./run_gamma_correction_on_device.sh; fi

Job has been submitted to Intel(R) DevCloud and will execute soon.

 If you do not see result in 60 seconds, please restart the Jupyter kernel:
 Kernel -> 'Restart Kernel and Clear All Outputs...' and then try again

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
688791.v-qsvr-1            ...ub-singleuser u48845          00:01:11 R jupyterhub     
688822.v-qsvr-1            ..._on_device.sh u48845                 0 Q batch          

Waiting for Output ████████████████████ Done⬇

########################################################################
#      Date:           Wed Sep 16 19:32:09 PDT 2020
#    Job ID:           688822.v-qsvr-1.aidevcloud
#      User:           u48845
# Resources:           neednodes=1:gpu:ppn=2,nodes=1:gpu:ppn=2,walltime=06:00:00
########################################################################

## u48845 is compiling Gamma correction example f