# Graphcore Poplar Path-Tracer on Gradient

This notebook contains the instructions to build and run a [Poplar C++ Ray/Path Tracer](https://github.com/markp-gc/ipu_ray_lib) for Graphcore IPUs. This notebook demonstrates how to render a path-traced image and also how analyse the correctness of the ray-tracer by comparing its output with [Embree](https://www.embree.org).

![Suzanne Path-Traced on IPU](images/monkey_bust_16384spp.png)

## Instructions

First you need to launch an IPU machine: click the 'start machine' button above.

![Start Machine Screenshot](images/start_machine.png)

### Fetch and Build the Code

We can execute these shell commands to build the application. The build uses CMake:

In [None]:
!cd ipu_ray_lib && git pull
!mkdir -p ipu_ray_lib/build
!cd ipu_ray_lib/build && cmake -Wno-dev -G Ninja .. && ninja -j64

### Run the Application

Ray data is distributed across all tiles (cores) but the scene data (BVH) is currently replicated across all tiles. This means meshes need to fit on one tile. You can specify your own scenes using the `--mesh-file` option but there is a built-in scene which is rendered if no file is specified. To render it execute the following cell: after ~30 seconds it will output an EXR image:

In [None]:
!cd ipu_ray_lib/build && ./trace -w 1440 -h 1440 --render-mode path-trace --visualise rgb --samples 1000 --ipus 4 --ipu-only

The resulting image is high dynamic range (HDR) in EXR format. We can make a function
to perform a quick tone-mapping and display the resulting image in Python:

In [None]:
import matplotlib.pyplot as plt
import cv2
import numpy as np

# Function to apply simple gamma correction, rescale,
# and clip values into range 0-255:
def gamma_correct(x, exposure, gamma):
  scale = 2.0 ** exposure
  y = np.power(x * scale, 1.0 / gamma) * 255.0
  return np.clip(y, 0.0, 255.0)

# Function to plot an opencv image:
def display_image(img):
  plt.figure(figsize=(8, 8))
  plt.style.use('dark_background')
  plt.imshow(cv2.cvtColor(ldr, cv2.COLOR_BGR2RGB), interpolation='bicubic')
  plt.show()

EXR_FLAGS = cv2.IMREAD_UNCHANGED | cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH
hdr = cv2.imread('ipu_ray_lib/build/out_rgb_ipu.exr', EXR_FLAGS)
print(f"HDR image shape: {hdr.shape} type: {hdr.dtype} min: {np.min(hdr)} max: {np.max(hdr)}")

ldr = gamma_correct(hdr, exposure=1.2, gamma=2.4).astype(np.uint8)
cv2.imwrite('tonemapped.png', ldr)
display_image(ldr)

You can open or download the saved result [tonemapped.png](tonemapped.png) in the file browser on the left. You may have to click on the refresh icon to update the view of the files.

If you want to render a CPU reference image remove the option `--ipu-only` but be aware it will take
much much longer to render. (For a list of all command options run `./test --help`.)

If you just want to compare arbitrary output variables (AOVs) between CPU/Embree/IPU you can
change to a quicker render mode. For example to compare normals run this:

In [None]:
!cd ipu_ray_lib/build && ./trace -w 1440 -h 1440 --render-mode shadow-trace --visualise normal --ipus 4

Once the outputs are ready we can load them into Python to compare:

In [None]:
# Load normals:
ipu_normals = cv2.imread('ipu_ray_lib/build/out_normal_ipu.exr', EXR_FLAGS)
cpu_normals = cv2.imread('ipu_ray_lib/build/out_normal_cpu.exr', EXR_FLAGS)
embree_normals = cv2.imread('ipu_ray_lib/build/out_normal_embree.exr', EXR_FLAGS)

compare = ipu_normals
abs_err = np.abs(compare - embree_normals)
print(f"IPU normals min: {np.min(compare)} max: {np.max(compare)}")
print(f"Embree normals min: {np.min(embree_normals)} max: {np.max(embree_normals)}")
print(f"ABS Error min: {np.min(abs_err)} max: {np.max(abs_err)} mean: {np.mean(abs_err)}")

# Plot them side by side:
vis = ((compare + 1.0) / 2.0)
vis_embree = ((embree_normals + 1.0) / 2.0)
fig, ax = plt.subplots(nrows=1, ncols=2, sharey=True, figsize=(16, 8))
ax[0].imshow(vis)
ax[0].set_title('IPU')
ax[1].imshow(vis_embree)
ax[1].set_title('Embree')
plt.show()

We can plot an error histogram (using a log scale because the error counts are small). As you can see most errors are tiny but there are a few outliers - these will be rays that hit alternative (i.e. possibly valid within machine precision) objects due to differences between our intersection test code and Embree's:

In [None]:
plt.hist(abs_err.flatten(), bins=300, range=[0.0, np.max(abs_err)], log=True)
plt.show()

#### Render the Teaser Image

Finally we can render the teaser image from the top of the page. That was a crop from an 8k image rendered to 16k samples per pixel using 8 IPUs. To get the result a bit quicker we can specify that we only want to render the cropped region like this:

In [None]:
!cd ipu_ray_lib/build && ./trace --render-mode path-trace --visualise rgb --ipus 4 --ipu-only -w 5760 -h 5760 --crop 1360x1060+2644+2860 --samples 10000

This should take about two minutes to take 10 thousand samples per pixel. Finally, we can tone map and display the rendered region and save in [teaser.png](teaser.png):

In [None]:
hdr = cv2.imread('ipu_ray_lib/build/out_rgb_ipu.exr', EXR_FLAGS)
print(f"Input shape: {hdr.shape}")
# slice out the rendered region:
w = 1360
h = 1060
c = 2644
r = 2860
hdr = hdr[r:r+h, c:c+w, :]
print(f"Cropped shape: {hdr.shape}")
ldr = gamma_correct(hdr, exposure=1.0, gamma=2.66).astype(np.uint8)
cv2.imwrite('teaser.png', ldr)
display_image(ldr)

### Enjoy

Although the program is currently limited in the size of BVH it can render it is very fast due to the IPU's large scale mulitple-instruction-multiple-data (MIMD) parallelism. It is a useful tool for experimenting or learning about path-tracing
because you can render high resolution results much faster than you can on CPU with simple C++ code. For example, the IPU
kernel code is uncomplicated, the path-trace inner loop looks like this:
```C++
  for (auto i = 0u; i < maxPathLength; ++i) {
    // offset rays to avoid self intersection:
    offsetRay(hit.r, hit.normal);
    // Reset ray limits for next bounce:
    hit.r.tMin = 0.f;
    hit.r.tMax = std::numeric_limits<float>::infinity();
    auto intersected = bvh.intersect(hit.r, primLookup);

    if (intersected) {
      updateHit(intersected, hit);
      const auto& material = wrappedMaterials[wrappedMatIDs[hit.geomID]];

      if (material.emissive) {
        color += throughput * material.emission;
      }

      if (material.type == Material::Type::Diffuse) {
        // Use HW random number generator for samples:
        const float u1 = hw_uniform_0_1();
        const float u2 = hw_uniform_0_1();
        hit.r.direction = sampleDiffuse(hit.normal, u1, u2);
        // Update throughput:
        throughput *= material.albedo;
      } else if (material.type == Material::Type::Specular) {
        hit.r.direction = reflect(hit.r.direction, hit.normal);
        throughput *= material.albedo;
      } else if (material.type == Material::Type::Refractive) {
        const float u1 = hw_uniform_0_1();
        const auto [dir, refracted] = dielectric(hit.r, hit.normal, material.ior, u1);
        hit.r.direction = dir;
        if (refracted) { throughput *= material.albedo; }
      } else {
        // Mark an error:
        result.rgb *= std::numeric_limits<float>::quiet_NaN();
      }
    } else {
      break;
    }

    // Random stopping:
    if (i > rouletteStartDepth) {
      const float u1 = hw_uniform_0_1();
      if (evaluateRoulette(u1, throughput)) { break; }
    }
  }
```

If you want to browse the code here are some good starting points:

- [TraceCodelets.cpp](ipu_ray_lib/codelets/TraceCodelets.cpp): IPU ray-tracing and path-tracing C++ kernels.
- [trace.cpp](ipu_ray_lib/trace.cpp): The main program. In particular note the functions: `renderEmbree`, `renderCPU`, `renderIPU`.
- [IpuScene.cpp](ipu_ray_lib/src/IpuScene.cpp): This compiles the IPU ray/path trace graph program using the Poplar graph compiler.
- [CompactBVH2Node.hpp](ipu_ray_lib/include/CompactBVH2Node.hpp): Reduced precision BVH node structure used on the IPU (24 bytes per node).
- [README.md](ipu_ray_lib/README.md): Contains more information about how the program works and its origins.

If you want to make significant changes then you will need to consult the [Poplar SDK documentation](https://docs.graphcore.ai/projects/poplar-user-guide/en/latest/introduction.html).

If you want more speed you can try a BOW-POD where the IPUs are clocked
40% higher than on a standard POD.

It is possible to export simple scenes from [Blender](https://www.blender.org) to render but material import is a little limited at the moment. An example scene exported to DAE format is provided in the assets folder: [DAE file](ipu_ray_lib/assets/test_scene.dae). That file is human readable which can help in understanding how the importer interprets it.