CUDA Path Tracer

Scene: test_3.txt, Iterations: 30000, Camera lens: 0.5, Camera focal length: 3

Name: Gizem Dal
- LinkedIn, personal website

Project Description

This is a CUDA-based path tracer capable of rendering globally-illuminated images very quickly with the power of parallelization. Check out the CUDA Denoiser project to see how the denoising approach discussed in the Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering paper by Dammertz, Sewtz, Hanika, and Lensch is implemented for a CUDA path tracer.

Material Overview

Material shading is split into different BSDF evaluation functions based on material type. Supported materials include diffuse, mirror, perfect refractive, fresnel dielectric and glossy materials. Diffuse material scattering is computed by using cosine-weighted samples within a hemisphere. Fresnel dielectric materials are defined in the scene file with an index of refraction, which is used by BSDF evaluation to compute the probability of refracting versus reflecting the scatter ray. Mirror material scattering function reflects the ray along the surface normal while glossy reflection happens within a lobe computed by the specular exponent of the material.

Speculars

Fresnel dielectric	Perfect refractive	Perfect specular

Imperfect Specular Reflection

Exponent = 0	Exponent = 5	Exponent = 12	Exponent = 50	Exponent = 500

Lower specular exponent values give results that are closer to diffuse scattering while larger specular exponent values result in larger highlights and more mirror-like surfaces.

Features Overview

GLTF Mesh Loading

Icosahedron	Magnolia	Duck

In order to bring the mesh data into C++, I used the tinygltf library. I used the VBOs from the imported data to create the mesh triangles and store triangle information per arbitrary mesh.

Bounding volume intersection culling is applied at the ray-geometry intersection test to reduce the number of rays that have to be checked against the entire mesh by first checking rays against a volume that completely bounds the mesh. This feature is implemented as toggleable for performance analysis purposes. Pressing the 'B' key while running the GPU renderer will enable this feature.

In order to smoothen the triangles on round GLTF meshes, the intersection normal is computed from the barycentric interpolation of the 3 normals from the triangle vertices.

Depth of Field

Focal Distance = 30, Lens Radius = 2.5	Focal Distance = 20, Lens Radius = 2.5

The scene camera can be set to enable focal distance and lens radius to get a depth of field effect. Geometries located at the focal distance within the lens radius will stay in focus while other geometry around the scene will be distorted.

Anti-aliasing

Anti-aliasing enabled	Anti-aliasing disabled

Using anti-aliasing for subpixel sampling results in smoother geometry edges in the render results. It is important to note that anti-aliasing and first bounce cache do not work together, since the pixel samples will differ per iteration and cached first bounces from the first iteration won't match the generated ray direction in further iterations. In order to provide flexibility, I set first bounce cache usage as a toggleable feature rather than the default, so that anti-aliasing could be enabled if the first bounce cache is not used.

SDF-Based Implicit Surfaces

Tanglecube	Bounding Box

Ray-geometry intersection for implicit surfaces is computed by special signed distance functions and ray marching. The sign of the SDF return value determines whether the ray is outside, on or inside the implicit geometry. Ray marching is used for following the ray direction in small increments, passing the current position on the ray to the SDF function and deciding whether an intersection occured or the marching should be terminated if the maximum marching distance is reached. These implicit surfaces can be combined with specular BSDFs such as mirror or fresnel dielectric material.

Mirror Tanglecube	Glass Bounding Box

Note: For the best render results, the bound box should have the scale value of 1 for all scale components.

Procedural Textures

FBM	Noise

I closely followed the procedural texture implementations from the link provided on top of the render images. I find the Book of Shaders noise texture implementations to be pretty useful for generating aesthetically pleasing procedural textures from fragment data. The two procedural textures currently supported by the renderer include Fractal Brownian Motion texture which benefits from a loop of adding noise to create a fractal looking noise pattern, and Wood Noise with a swirl effect.

Note: The procedural textures work the best with round geometry such as spheres, round arbitary meshes, tanglecube, etc.

FBM Tanglecube	Noise Duck

Stratified Sampling

Stratified (4 samples)	Random (4 samples)

Stratified (4 samples close up)	Random (4 samples close up)

Although it isn't very visible at larger sample sizes, using stratified samples compared to random samples results in slightly more converged results at very small iteration steps.

Hierarchical Spatial Structure - Octree (In progress)

I started implementing a hierarchical spatial structure named Octree. The purpose of this data structure is to contain the scene geometry within children nodes (at most 8 children nodes per node) by using 3D volume bounding boxes with the goal of eliminating naive geometry iteration in the ray-scene intersection test, thus improve the rendering performance. Due to time constraints, this feature is not completed yet though it is still in the works.

Denoiser

This project also includes a pathtracing denoiser that uses geometry buffers (G-buffers) to guide a smoothing filter. The technique is based on the Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering paper by Dammertz, Sewtz, Hanika, and Lensch. The example renders below show the effect of this denoiser with the following parameters:

Filter Size: 10
Blur Size: 64
Col_W: 38.398
Nor_W: 0.610
Pos_W: 3.315
Light Intensity: 1

Original Render (10 iterations)	Denoised Render (10 iterations)

Original Render (2 iterations)	Denoised Render (2 iterations)

GBuffers include scene geometry information such as per-pixel normals and per-pixel positions, as well as surface color for preserving detail in mapped or procedural textures. The current implementation stores per-pixel metrics only from the first bounce.

Position	Normal	Base Color

Denoiser Parameters

Denoiser parameters such as filter and blur sizes, number of iterations, color/normal/position weights are adjustable from the provided Imgui GUI.

Blur Size

The examples below use the following parameters:

Filter Size: 5
Iterations: 2
Col_W: 11.789
Nor_W: 0.610
Pos_W: 0.407

Blur=10	Blur=33	Blur=80	Blur=185

Increasing the blur size allows more denoiser iterations since the blur size determines the largest expansion width the filter can reach. Applying more iterations result in smoother images, especially for very small number of path tracer iterations (which in this example is only 2).

Color Weight

The examples below use the following parameters:

Filter Size: 5
Iterations: 2
Blur Size: 80
Nor_W: 0.610
Pos_W: 0.407

Col_W=1.423	Col_W=4.675	Col_W=15.244	Col_W=30.285

The results suggest that increasing the color weight results in smoother denoised results. Lower color weights result in denoised renders with more "fireflies" while larger weights give smoother results. However, it is important to mention that the impact of color weight may depend from implementation to implementation. The current adaptation from the A-Trous approach halves the color weight at each denoise blur iteration in order to smoothen smaller smaller illumination variations, as suggested by the paper.

Light Size

The examples below use the following parameters:

Filter Size: 5
Iterations: 10
Blur Size: 63
Col_W: 28.455
Nor_W: 0.813
Pos_W: 3.862
Light Intensity: 1

Light Width = 3	Light Width = 5	Light Width = 7	Light Width = 10

While increasing the light size gives us better illuminated renders in these examples, increasing the light intensity may result in less smoother results with a lot of fireflies. The example renders below use the following parameters:

Filter Size: 10
Iterations: 10
Blur Size: 64
Col_W: 38.398
Nor_W: 0.610
Pos_W: 3.315

Intensity = 1	Intensity = 2

Filter Size

The examples below use the following parameters (The blur sizes are adjusted to allow the same number of blur iterations for different filter sizes):

Iterations: 10
Col_W: 16.667
Nor_W: 0.610
Pos_W: 0.813

Filter = 5, Blur = 80	Filter = 9, Blur = 80	Filter = 21, Blur = 81	Filter = 43, Blur = 103

The results suggest that changing the filter size doesn't seem to have a significant impact on denoiser results.

Insights

One interesting observation I have is that using material sort results in more stable render results compared to naive approach. The two images below, both rendered with 4950 iterations, are renders from the same camera position. The render on the left is taken by sorting rays by material type while the one on the right is rendered by the naive approach.

Material sort enabled	Material sort disabled

Performance Analysis

The performance is measured on a Predator G3-571 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80 GHz 2.81 GHz machine.

I used a GPU timer to conduct my performance analysis on different features and settings in the renderer. This timer is wrapped up as a performance timer class, which uses the CUDAEvent library, in order to measure the time cost conveniently. The timer is started right before the iteration loop and is terminated once a certain number of iterations is reached. The measured time excludes the initial cudaMalloc() and cudaMemset() operations for the path tracer buffers, but still includes the cudaGLMapBuffer operations.

Stream Compaction

Using stream compaction to terminate paths which don't hit any geometry in the scene can be very beneficial for open scenes where some portion of the rays will shoot to void from the start. In order to show the impact of stream compaction on render path segments, I have created 3 test scenes with different layouts. All 3 test scenes are rendered with 1500 iterations.

Closed Cornell	Open Cornell I	Open Cornell II

For stream compaction analysis, I timed a single iteration of the renderer and recorded the number of remaning paths after each path termination in the 3 test scenes. I used a ray depth of 8, such that once this depth is reached all remaning paths would be terminated.

I also recorded the total runtime of 1 iteration per test scene.

Scene	Measured runtime (in seconds)
Closed Cornell	0.260587
Open Cornell I	0.146422
Open Cornell II	0.106178

From this data, we can conclude that using stream compaction for path termination is very beneficial performance wise for open scenes where some rays could shoot to void and not hit any geometry within the scene. Although terminating as many paths as possible when needed is a significant performance improvement, it might not be possible to terminate many rays in closed scenes such as Closed Cornell where rays cannot escape.

Volume Intersection Culling

I used 4 arbitrary mesh examples to analyze the peformance benefits of enabling volume intersection culling for complex meshes. The table below shows the mesh examples used for this analysis and how many triangles they contain.

Mesh	Number of triangles
Icosahedron	20
Magnolia	1372
Duck	4212
Stanford Bunny	69630

I measured the total runtime of 6 iterations in 1 test scene per mesh. I used simple open cornell box scenes where no other arbitrary meshes except the subject mesh is included. The runtime measurements yielded the following results.

Using volume intersection culling for simpler arbitrary meshes such as Icosahedron or Dodecahedron doesn't provide a significant performance improvement for a small number of iterations, however as the total number of iterations increases it can provide more significant efficiency. Although we do not observe a significant performance improvement with Icosahedron, we can see that using volume intersection culling improves scene intersection test performance significantly for scenes where much more complex shapes such as Stanford Bunny is present. Just with 6 iterations, using volume intersection culling has saved about 29 seconds.

Procedural Textures

Current procedural textures supported by the renderer make many calls to noise helper functions that call many glm math functions. In order to analyze the potential impacts of procedural textures on runtime, I created 3 simple test scenes with a sphere where the diffuse material of the sphere uses FBM, Wood noise and no texture separately and compared the total runtimes of 100 and 500 iterations.

Although the difference is not very significant due to the small number of iterations, using the FBM texture seems to be slightly less efficient than using Wood Noise or no texture at all. Since FBM functions usually call their helpers the octave amount of times, it is possible that these subsequent function calls could slow down the performance.

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
cmake		cmake
external		external
img		img
imgui		imgui
scenes		scenes
src		src
stream_compaction		stream_compaction
.cproject		.cproject
.gitignore		.gitignore
.project		.project
CMakeLists.txt		CMakeLists.txt
GNUmakefile		GNUmakefile
INSTRUCTION.md		INSTRUCTION.md
Project3-CUDA-Path-Tracer.launch		Project3-CUDA-Path-Tracer.launch
README.md		README.md

gizemdal/CUDA-Path-Tracer

Folders and files

Latest commit

History

Repository files navigation

CUDA Path Tracer

Project Description

Table of Contents

Material Overview

Features Overview

Denoiser

Denoiser Parameters

Insights

Performance Analysis

Bloopers

About

Topics

Resources

Stars

Watchers

Forks

Languages