Skip to content
This repository has been archived by the owner on Jan 3, 2020. It is now read-only.

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
matt77hias committed Apr 6, 2016
0 parents commit 0e705a1
Show file tree
Hide file tree
Showing 134 changed files with 42,053 additions and 0 deletions.
106 changes: 106 additions & 0 deletions .gitignore
@@ -0,0 +1,106 @@
scenes/

## Ignore Visual Studio temporary files, build results, and
## files generated by popular Visual Studio add-ons.

# User-specific files
*.suo
*.user
*.userosscache
*.sln.docstates

# User-specific files (MonoDevelop/Xamarin Studio)
*.userprefs

# Build results
[Dd]ebug/
[Dd]ebugPublic/
[Rr]elease/
[Rr]eleases/
x64/
x86/
bld/
[Bb]in/
[Oo]bj/
[Ll]og/

# Visual Studio 2015 cache/options directory
.vs/
# Uncomment if you have tasks that create the project's static files in wwwroot
#wwwroot/

# MSTest test Results
[Tt]est[Rr]esult*/
[Bb]uild[Ll]og.*

# NUNIT
*.VisualState.xml
TestResult.xml

# Build Results of an ATL Project
[Dd]ebugPS/
[Rr]eleasePS/
dlldata.c

# DNX
project.lock.json
artifacts/

*_i.c
*_p.c
*_i.h
*.ilk
*.meta
*.obj
*.pch
*.pdb
*.pgc
*.pgd
*.rsp
*.sbr
*.tlb
*.tli
*.tlh
*.tmp
*.tmp_proj
*.log
*.vspscc
*.vssscc
.builds
*.pidb
*.svclog
*.scc

# Visual C++ cache files
ipch/
*.aps
*.ncb
*.opendb
*.opensdf
*.sdf
*.cachefile
*.VC.db
*.VC.VC.opendb

# Visual Studio profiler
*.psess
*.vsp
*.vspx
*.sap

# Click-Once directory
publish/

# Visual Studio cache files
# files ending in .cache can be ignored
*.[Cc]ache
# but keep track of directories ending in .cache
!*.[Cc]ache/

# Backup & report files from converting an old project file
# to a newer Visual Studio version. Backup files are not needed,
# because we have git ;-)
_UpgradeReport_Files/
Backup*/
UpgradeLog*.XML
UpgradeLog*.htm
24 changes: 24 additions & 0 deletions LICENSE
@@ -0,0 +1,24 @@
Copyright (c) 2009-2011, NVIDIA Corporation
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of NVIDIA Corporation nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
219 changes: 219 additions & 0 deletions README
@@ -0,0 +1,219 @@

Fast GPU Ray Traversal 1.4
--------------------------
Implementation by Tero Karras, Timo Aila, and Samuli Laine
Copyright 2009-2012 NVIDIA Corporation

This package contains full source code for the fast GPU-based ray traversal
routines used in the following paper:

"Understanding the Efficiency of Ray Traversal on GPUs",
Timo Aila and Samuli Laine,
Proc. High-Performance Graphics 2009
http://www.tml.tkk.fi/~timo/publications/aila2009hpg_paper.pdf

In addition to the original kernels that were optimized for NVIDIA GTX 285,
the package also includes kernels specifically hand-tuned for GTX 480
(GF100/Fermi) and GTX 680 (GK104/Kepler). The results for these GPUs have
been published in the following technical report:

"Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum",
Timo Aila, Samuli Laine, and Tero Karras,
NVIDIA Technical Report NVR-2012-02,
http://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum

The accompanying benchmark application and test scenes aim to replicate the
published results as accurately as possible, although there are slight
differences in the test setup (e.g. BVH builder, CUDA version).
See results.txt for details.

The source code is licensed under New BSD License (see LICENSE), and
hosted by Google Code:

http://code.google.com/p/understanding-the-efficiency-of-ray-traversal-on-gpus/


System requirements
-------------------

- Microsoft Windows XP, Vista, or 7.

- At least 1GB of system memory.

- NVIDIA CUDA-compatible GPU with compute capability 1.2 and at least 1.5
gigabytes of RAM. GeForce GTX 480 or GTX 680 is recommended.

- Microsoft Visual Studio 2010. Required even if you do not plan to build
the source code, as the runtime CUDA compilation mechanism depends on it.


Instructions
------------

1. Install Visual Studio 2010. The Express edition can be downloaded from:
http://www.microsoft.com/visualstudio/en-us/products/2010-editions/visual-cpp-express

2. Install the latest NVIDIA GPU drivers and CUDA Toolkit.
http://developer.nvidia.com/object/cuda_archive.html

3. Run rt.exe to start the application in interactive mode. The first
run executes certain initialization tasks that may take a while to
complete.

4. If you get an error during initialization, the most probable explanation
is that the application is unable to launch nvcc.exe contained in the
CUDA Toolkit. In this case, you should:

- Set CUDA_BIN_PATH to point to the CUDA Toolkit "bin" directory, e.g.
"set CUDA_BIN_PATH=C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin".

- Set CUDA_INC_PATH to point to the CUDA Toolkit "include" directory, e.g.
"set CUDA_INC_PATH=C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include".

- Run vcvars32.bat to setup Visual Studio paths, e.g.
"C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\vcvars32.bat".

5. Run benchmark.cmd to measure the performance of all test scenes with
the same settings that were used in the paper. Expected performance
numbers for different GPUs are listed in "results.txt". Note that a
64-bit build is required to benchmark the San Miguel scene.

6. Optional: Build the application yourself.

- Open rt.sln in Visual Studio 2010.
- Right-click the "rt" project and select "Set as StartUp Project".
- Select Release build. Debug build is very slow, especially when sorting
secondary rays during the benchmark.
- Build and run.


Test setup
----------

Camera positions:

- Results of each scene are averaged over 5 distinct camera positions.
- Camera positions are specified on the command line using
"signature strings".
- To generate a signature string, click "Show camera controls" in the
interactive mode and then "Export camera signature..."

Ray generation:

- Viewport of 1024x768 pixels.
- Primary rays that hit scene geometry generate 32 secondary rays (AO or
diffuse) distributed according to a cosine-weighted Halton sequence on
the hemisphere.
- Primary rays that miss the geometry generate dummy secondary rays that
are ignored by the ray traversal kernel and excluded from the
rays-per-second figures.
- AO rays are of limited length and terminate immediately after encountering
an intersection.
- Diffuse interreflection rays are very long and continue the traversal
until they find the closest intersection.
- See src/rt/ray/RayGen.cpp

Batches and sorting:

- All primary rays are traced in a single launch.
- Secondary rays are divided into batches of 2^20 rays, and each batch
is traced in a separate launch.
- Primary rays are generated according to 2D Morton order in screen space.
- Each batch of secondary rays is sorted according to 6D Morton order based
on ray origin and direction vectors.
- The sorting of is performed by the CPU and is a quite heavy operation,
taking roughly one second per batch. It is disabled in the interactive
mode.
- See src/rt/ray/PixelTable.cpp and src/rt/ray/RayBuffer.cpp

Acceleration structure:

- AABB-based binary bounding volume hierarchy.
- Built using spatial triangle splits (Stich et al.) to improve quality:
http://www.nvidia.com/docs/IO/77714/sbvh.pdf
- Triangles are represented using Woop's affine triangle transformation:
http://www.sven-woop.de/publications/Diplom_SvenWoop_Final.pdf
- Memory layout varies between individual traversal kernels.
- See src/rt/bvh/SplitBVHBuilder.cpp and src/cuda/CudaBVH.cpp

Ray traversal:

- Performance results are based on the time spent in ray traversal for
the selected ray type. Ray generation and sorting are excluded from
the measurements.
- The code for launching the traversal kernels can be found in
src/cuda/CudaTracer.cpp
- The kernels themselves are located in src/rt/kernels:

fermi_speculative_while_while
Hand-tuned to yield the best performance on GTX 480.
Works on older GPUs as well, but is not optimal.

kepler_dynamic_fetch
Hand-tuned to yield the best performance on GTX 680.
Works on older GPUs as well, but is not optimal.

tesla_persistent_packet
"Persistent packet" kernel from the paper.

tesla_persistent_speculative_while_while
"Persistent speculative while-while" kernel from the paper.
This is the fastest kernel on GTX 285.

tesla_persistent_while_while
"Persistent while-while" kernel from the paper.


Version history
---------------

Version 1.4, May 22, 2012
- Include hand-tuned kernels for Kepler-based GPUs.
- Improve fermi_speculative_while_while perf using vmin/vmax PTX instructions.
- Include San Miguel test scene in the package.
- Improve robustness of the BVH builder with degenerate input.
- Switch to New BSD License (previously Apache License 2.0).
- Upgrade to Visual Studio 2010 (previously 2008).
- Fix a CUDA compilation issue with Visual Studio Express.
- General bugfixes and improvements to framework.

Version 1.3, Jul 08, 2011
- Fix compatibility issues with CUDA 4.0.

Version 1.2, Dec 17, 2010
- Fix issues with nvcc path autodetection with CUDA 3.2.

Version 1.1, Dec 01, 2010
- Update the codebase to support GF104 and CUDA 3.2.
- Speed up ray sorting significantly by utilizing all available CPU cores.
- Minor stability improvements.

Version 1.0, Jun 29, 2010
- Initial release.


Known issues
------------

- When using CUDA 3.2 or later, the performance of device-side code drops
slightly in 64-bit builds. This is because CUDA 3.2 disallows "mixed-bitness
mode", which we utilize on earlier CUDA versions to get maximum performance.
With CUDA 3.2, we must always compile device code with the same bitness
as host code, which generally results in higher register pressure in 64-bit
builds.

For more information, see:
http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_3.2_Readiness_Tech_Brief.pdf

- The mesh importer only supports a limited subset of the Wavefront OBJ
file format. If you have trouble importing a mesh, you may want to try
enabling WAVEFRONT_DEBUG in src/framework/io/MeshWavefrontIO.cpp.


Acknowledgements
----------------

Anat Grynberg and Greg Ward for the Conference room model.
University of Utah for the Fairy scene.
Marko Dabrovic (www.rna.hr) for the Sibenik cathedral model.
Guillermo M. Leal Llaguno (www.evvisual.com) for the San Miguel model.
28 changes: 28 additions & 0 deletions benchmark.cmd
@@ -0,0 +1,28 @@
@echo off
set LOG=benchmark.log

rem Find executable.

set EXE=rt_x64_Release.exe
if not exist %EXE% set EXE=rt_Win32_Release.exe
if not exist %EXE% set EXE=rt.exe

rem Benchmark conference, fairyforest, and sibenik.

%EXE% benchmark --log=%LOG% --mesh=scenes/rt/conference/conference.obj --sbvh-alpha=1.0e-5 --ao-radius=5 --camera="6omr/04j3200bR6Z/0/3ZEAz/x4smy19///c/05frY109Qx7w////m100" --camera="Lpmr/07k3200CS6Z/0/QqOIz1qfnsx19///c/05frY109Qx7w////m100" --camera="Y1BR00IkZd/0aA9X/0/Gy8Px1ca7Tw19///c/05frY109Qx7w////m100" --camera="XYDl00Gqv600byxY/00IQE4x/jN1jx/9///c/05frY109Qx7w////m100" --camera="w:ie00yxXX00ND1b/03TZ6qy1egt3x/9///c/05frY109Qx7w////m100"
%EXE% benchmark --log=%LOG% --mesh=scenes/rt/fairyforest/fairyforest.obj --sbvh-alpha=1.0e-5 --ao-radius=0.3 --camera="cIxMx/sK/Ty/EFu3z/5m9mWx/YPA5z/8///m007toC10AnAHx///Uy200" --camera="KI/Qz/zlsUy/TTy6z13BdCZy/LRxzy/8///m007toC10AnAHx///Uy200" --camera="mF5Gz1SuO1z/ZMooz11Q0bGz/CCNxx18///m007toC10AnAHx///Uy200" --camera="vH7Jy19GSHx/YN45x//P2Wpx1MkhWy18///m007toC10AnAHx///Uy200" --camera="ViGsx/KxTFz/Ypn8/05TJTmx1ljevx18///m007toC10AnAHx///Uy200"
%EXE% benchmark --log=%LOG% --mesh=scenes/rt/sibenik/sibenik.obj --sbvh-alpha=1.0e-5 --ao-radius=5 --camera="ytIa02G35kz1i:ZZ/0//iSay/5W6Ex19///c/05frY109Qx7w////m100" --camera=":Wp802ACAD/2x9OQ/0/waE8z/IOKbx/9///c/05frY109Qx7w////m100" --camera="CFtpy/s6ea/28btX/0172CFy/K5g1z/9///c/05frY109Qx7w////m100" --camera="steO/0TlN1z1tsDg/03InaMz/bqZxx/9///c/05frY109Qx7w////m100" --camera="HJv//034:Rx1S4Xh/03dpXux1BVmGw/9///c/05frY109Qx7w////m100"

rem Benchmark San Miguel.
rem - Requires 64-bit build to avoid running out of CPU virtual address space.
rem - Do not use Tesla kernels, since they require more than 1.5 GB of GPU memory for the BVH.

if "%EXE%"=="rt_x64_Release.exe" goto run_sanmiguel
echo San Miguel requires 64-bit build. Skipping.
goto done

:run_sanmiguel
%EXE% benchmark --log=%LOG% --mesh=scenes/rt/sanmiguel/sanmiguel.obj --sbvh-alpha=1.0e-6 --ao-radius=1.5 --kernel=fermi_speculative_while_while --kernel=kepler_dynamic_fetch --camera="Yciwz1oRQmz/Xvsm005CwjHx/b70nx18tVI7005frY108Y/:x/v3/z100" --camera="NhL2/2tO1w/0OIZh005DPZMz/xC9Cz18tVI7005frY108Y/:x/v3/z100" --camera="AbE3/0LWiZz/4Ccj005X5X1z1qJ13x/8BfRky/5frY108Y/:x/v3/z100"

:done
echo Done.

0 comments on commit 0e705a1

Please sign in to comment.