Initial commit

matt77hias · Apr 6, 2016 · 0e705a1 · 0e705a1
commit 0e705a1
Show file tree

Hide file tree

Showing 134 changed files with 42,053 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,106 @@
+scenes/
+
+## Ignore Visual Studio temporary files, build results, and
+## files generated by popular Visual Studio add-ons.
+
+# User-specific files
+*.suo
+*.user
+*.userosscache
+*.sln.docstates
+
+# User-specific files (MonoDevelop/Xamarin Studio)
+*.userprefs
+
+# Build results
+[Dd]ebug/
+[Dd]ebugPublic/
+[Rr]elease/
+[Rr]eleases/
+x64/
+x86/
+bld/
+[Bb]in/
+[Oo]bj/
+[Ll]og/
+
+# Visual Studio 2015 cache/options directory
+.vs/
+# Uncomment if you have tasks that create the project's static files in wwwroot
+#wwwroot/
+
+# MSTest test Results
+[Tt]est[Rr]esult*/
+[Bb]uild[Ll]og.*
+
+# NUNIT
+*.VisualState.xml
+TestResult.xml
+
+# Build Results of an ATL Project
+[Dd]ebugPS/
+[Rr]eleasePS/
+dlldata.c
+
+# DNX
+project.lock.json
+artifacts/
+
+*_i.c
+*_p.c
+*_i.h
+*.ilk
+*.meta
+*.obj
+*.pch
+*.pdb
+*.pgc
+*.pgd
+*.rsp
+*.sbr
+*.tlb
+*.tli
+*.tlh
+*.tmp
+*.tmp_proj
+*.log
+*.vspscc
+*.vssscc
+.builds
+*.pidb
+*.svclog
+*.scc
+
+# Visual C++ cache files
+ipch/
+*.aps
+*.ncb
+*.opendb
+*.opensdf
+*.sdf
+*.cachefile
+*.VC.db
+*.VC.VC.opendb
+
+# Visual Studio profiler
+*.psess
+*.vsp
+*.vspx
+*.sap
+
+# Click-Once directory
+publish/
+
+# Visual Studio cache files
+# files ending in .cache can be ignored
+*.[Cc]ache
+# but keep track of directories ending in .cache
+!*.[Cc]ache/
+
+# Backup & report files from converting an old project file
+# to a newer Visual Studio version. Backup files are not needed,
+# because we have git ;-)
+_UpgradeReport_Files/
+Backup*/
+UpgradeLog*.XML
+UpgradeLog*.htm
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,24 @@
+Copyright (c) 2009-2011, NVIDIA Corporation
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+    * Redistributions of source code must retain the above copyright
+      notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+      notice, this list of conditions and the following disclaimer in the
+      documentation and/or other materials provided with the distribution.
+    * Neither the name of NVIDIA Corporation nor the
+      names of its contributors may be used to endorse or promote products
+      derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
+DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/README b/README
@@ -0,0 +1,219 @@
+
+Fast GPU Ray Traversal 1.4
+--------------------------
+Implementation by Tero Karras, Timo Aila, and Samuli Laine
+Copyright 2009-2012 NVIDIA Corporation
+
+This package contains full source code for the fast GPU-based ray traversal
+routines used in the following paper:
+
+    "Understanding the Efficiency of Ray Traversal on GPUs",
+    Timo Aila and Samuli Laine,
+    Proc. High-Performance Graphics 2009
+    http://www.tml.tkk.fi/~timo/publications/aila2009hpg_paper.pdf
+
+In addition to the original kernels that were optimized for NVIDIA GTX 285,
+the package also includes kernels specifically hand-tuned for GTX 480
+(GF100/Fermi) and GTX 680 (GK104/Kepler). The results for these GPUs have
+been published in the following technical report:
+
+    "Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum",
+    Timo Aila, Samuli Laine, and Tero Karras,
+    NVIDIA Technical Report NVR-2012-02,
+    http://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum
+
+The accompanying benchmark application and test scenes aim to replicate the
+published results as accurately as possible, although there are slight
+differences in the test setup (e.g. BVH builder, CUDA version).
+See results.txt for details.
+
+The source code is licensed under New BSD License (see LICENSE), and
+hosted by Google Code:
+
+http://code.google.com/p/understanding-the-efficiency-of-ray-traversal-on-gpus/
+
+
+System requirements
+-------------------
+
+- Microsoft Windows XP, Vista, or 7.
+
+- At least 1GB of system memory.
+
+- NVIDIA CUDA-compatible GPU with compute capability 1.2 and at least 1.5
+  gigabytes of RAM. GeForce GTX 480 or GTX 680 is recommended.
+
+- Microsoft Visual Studio 2010. Required even if you do not plan to build
+  the source code, as the runtime CUDA compilation mechanism depends on it.
+
+
+Instructions
+------------
+
+1. Install Visual Studio 2010. The Express edition can be downloaded from:
+   http://www.microsoft.com/visualstudio/en-us/products/2010-editions/visual-cpp-express
+
+2. Install the latest NVIDIA GPU drivers and CUDA Toolkit.
+   http://developer.nvidia.com/object/cuda_archive.html
+
+3. Run rt.exe to start the application in interactive mode. The first
+   run executes certain initialization tasks that may take a while to
+   complete.
+
+4. If you get an error during initialization, the most probable explanation
+   is that the application is unable to launch nvcc.exe contained in the
+   CUDA Toolkit. In this case, you should:
+
+   - Set CUDA_BIN_PATH to point to the CUDA Toolkit "bin" directory, e.g.
+     "set CUDA_BIN_PATH=C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin".
+
+   - Set CUDA_INC_PATH to point to the CUDA Toolkit "include" directory, e.g.
+     "set CUDA_INC_PATH=C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include".
+
+   - Run vcvars32.bat to setup Visual Studio paths, e.g.
+     "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\vcvars32.bat".
+
+5. Run benchmark.cmd to measure the performance of all test scenes with
+   the same settings that were used in the paper. Expected performance
+   numbers for different GPUs are listed in "results.txt". Note that a
+   64-bit build is required to benchmark the San Miguel scene.
+
+6. Optional: Build the application yourself.
+
+   - Open rt.sln in Visual Studio 2010.
+   - Right-click the "rt" project and select "Set as StartUp Project".
+   - Select Release build. Debug build is very slow, especially when sorting
+     secondary rays during the benchmark.
+   - Build and run.
+
+
+Test setup
+----------
+
+Camera positions:
+
+- Results of each scene are averaged over 5 distinct camera positions.
+- Camera positions are specified on the command line using
+  "signature strings".
+- To generate a signature string, click "Show camera controls" in the
+  interactive mode and then "Export camera signature..."
+
+Ray generation:
+
+- Viewport of 1024x768 pixels.
+- Primary rays that hit scene geometry generate 32 secondary rays (AO or
+  diffuse) distributed according to a cosine-weighted Halton sequence on
+  the hemisphere.
+- Primary rays that miss the geometry generate dummy secondary rays that
+  are ignored by the ray traversal kernel and excluded from the
+  rays-per-second figures.
+- AO rays are of limited length and terminate immediately after encountering
+  an intersection.
+- Diffuse interreflection rays are very long and continue the traversal
+  until they find the closest intersection.
+- See src/rt/ray/RayGen.cpp
+
+Batches and sorting:
+
+- All primary rays are traced in a single launch.
+- Secondary rays are divided into batches of 2^20 rays, and each batch
+  is traced in a separate launch.
+- Primary rays are generated according to 2D Morton order in screen space.
+- Each batch of secondary rays is sorted according to 6D Morton order based
+  on ray origin and direction vectors.
+- The sorting of is performed by the CPU and is a quite heavy operation,
+  taking roughly one second per batch. It is disabled in the interactive
+  mode.
+- See src/rt/ray/PixelTable.cpp and src/rt/ray/RayBuffer.cpp
+
+Acceleration structure:
+
+- AABB-based binary bounding volume hierarchy.
+- Built using spatial triangle splits (Stich et al.) to improve quality:
+  http://www.nvidia.com/docs/IO/77714/sbvh.pdf
+- Triangles are represented using Woop's affine triangle transformation:
+  http://www.sven-woop.de/publications/Diplom_SvenWoop_Final.pdf
+- Memory layout varies between individual traversal kernels.
+- See src/rt/bvh/SplitBVHBuilder.cpp and src/cuda/CudaBVH.cpp
+
+Ray traversal:
+
+- Performance results are based on the time spent in ray traversal for
+  the selected ray type. Ray generation and sorting are excluded from
+  the measurements.
+- The code for launching the traversal kernels can be found in
+  src/cuda/CudaTracer.cpp
+- The kernels themselves are located in src/rt/kernels:
+
+  fermi_speculative_while_while
+    Hand-tuned to yield the best performance on GTX 480.
+    Works on older GPUs as well, but is not optimal.
+
+  kepler_dynamic_fetch
+    Hand-tuned to yield the best performance on GTX 680.
+    Works on older GPUs as well, but is not optimal.
+
+  tesla_persistent_packet
+    "Persistent packet" kernel from the paper.
+
+  tesla_persistent_speculative_while_while
+    "Persistent speculative while-while" kernel from the paper.
+    This is the fastest kernel on GTX 285.
+
+  tesla_persistent_while_while
+    "Persistent while-while" kernel from the paper.
+
+
+Version history
+---------------
+
+Version 1.4, May 22, 2012
+- Include hand-tuned kernels for Kepler-based GPUs.
+- Improve fermi_speculative_while_while perf using vmin/vmax PTX instructions.
+- Include San Miguel test scene in the package.
+- Improve robustness of the BVH builder with degenerate input.
+- Switch to New BSD License (previously Apache License 2.0).
+- Upgrade to Visual Studio 2010 (previously 2008).
+- Fix a CUDA compilation issue with Visual Studio Express.
+- General bugfixes and improvements to framework.
+
+Version 1.3, Jul 08, 2011
+- Fix compatibility issues with CUDA 4.0.
+
+Version 1.2, Dec 17, 2010
+- Fix issues with nvcc path autodetection with CUDA 3.2.
+
+Version 1.1, Dec 01, 2010
+- Update the codebase to support GF104 and CUDA 3.2.
+- Speed up ray sorting significantly by utilizing all available CPU cores.
+- Minor stability improvements.
+
+Version 1.0, Jun 29, 2010
+- Initial release.
+
+
+Known issues
+------------
+
+- When using CUDA 3.2 or later, the performance of device-side code drops
+  slightly in 64-bit builds. This is because CUDA 3.2 disallows "mixed-bitness
+  mode", which we utilize on earlier CUDA versions to get maximum performance.
+  With CUDA 3.2, we must always compile device code with the same bitness
+  as host code, which generally results in higher register pressure in 64-bit
+  builds.
+
+  For more information, see:
+  http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_3.2_Readiness_Tech_Brief.pdf
+
+- The mesh importer only supports a limited subset of the Wavefront OBJ
+  file format. If you have trouble importing a mesh, you may want to try
+  enabling WAVEFRONT_DEBUG in src/framework/io/MeshWavefrontIO.cpp.
+
+
+Acknowledgements
+----------------
+
+Anat Grynberg and Greg Ward for the Conference room model.
+University of Utah for the Fairy scene.
+Marko Dabrovic (www.rna.hr) for the Sibenik cathedral model.
+Guillermo M. Leal Llaguno (www.evvisual.com) for the San Miguel model.
diff --git a/benchmark.cmd b/benchmark.cmd
@@ -0,0 +1,28 @@
+@echo off
+set LOG=benchmark.log
+
+rem Find executable.
+
+set EXE=rt_x64_Release.exe
+if not exist %EXE% set EXE=rt_Win32_Release.exe
+if not exist %EXE% set EXE=rt.exe
+
+rem Benchmark conference, fairyforest, and sibenik.
+
+%EXE% benchmark --log=%LOG% --mesh=scenes/rt/conference/conference.obj --sbvh-alpha=1.0e-5 --ao-radius=5 --camera="6omr/04j3200bR6Z/0/3ZEAz/x4smy19///c/05frY109Qx7w////m100" --camera="Lpmr/07k3200CS6Z/0/QqOIz1qfnsx19///c/05frY109Qx7w////m100" --camera="Y1BR00IkZd/0aA9X/0/Gy8Px1ca7Tw19///c/05frY109Qx7w////m100" --camera="XYDl00Gqv600byxY/00IQE4x/jN1jx/9///c/05frY109Qx7w////m100" --camera="w:ie00yxXX00ND1b/03TZ6qy1egt3x/9///c/05frY109Qx7w////m100"
+%EXE% benchmark --log=%LOG% --mesh=scenes/rt/fairyforest/fairyforest.obj --sbvh-alpha=1.0e-5 --ao-radius=0.3 --camera="cIxMx/sK/Ty/EFu3z/5m9mWx/YPA5z/8///m007toC10AnAHx///Uy200" --camera="KI/Qz/zlsUy/TTy6z13BdCZy/LRxzy/8///m007toC10AnAHx///Uy200" --camera="mF5Gz1SuO1z/ZMooz11Q0bGz/CCNxx18///m007toC10AnAHx///Uy200" --camera="vH7Jy19GSHx/YN45x//P2Wpx1MkhWy18///m007toC10AnAHx///Uy200" --camera="ViGsx/KxTFz/Ypn8/05TJTmx1ljevx18///m007toC10AnAHx///Uy200"
+%EXE% benchmark --log=%LOG% --mesh=scenes/rt/sibenik/sibenik.obj --sbvh-alpha=1.0e-5 --ao-radius=5 --camera="ytIa02G35kz1i:ZZ/0//iSay/5W6Ex19///c/05frY109Qx7w////m100" --camera=":Wp802ACAD/2x9OQ/0/waE8z/IOKbx/9///c/05frY109Qx7w////m100" --camera="CFtpy/s6ea/28btX/0172CFy/K5g1z/9///c/05frY109Qx7w////m100" --camera="steO/0TlN1z1tsDg/03InaMz/bqZxx/9///c/05frY109Qx7w////m100" --camera="HJv//034:Rx1S4Xh/03dpXux1BVmGw/9///c/05frY109Qx7w////m100"
+
+rem Benchmark San Miguel.
+rem - Requires 64-bit build to avoid running out of CPU virtual address space.
+rem - Do not use Tesla kernels, since they require more than 1.5 GB of GPU memory for the BVH.
+
+if "%EXE%"=="rt_x64_Release.exe" goto run_sanmiguel
+echo San Miguel requires 64-bit build. Skipping.
+goto done
+
+:run_sanmiguel
+%EXE% benchmark --log=%LOG% --mesh=scenes/rt/sanmiguel/sanmiguel.obj --sbvh-alpha=1.0e-6 --ao-radius=1.5 --kernel=fermi_speculative_while_while --kernel=kepler_dynamic_fetch --camera="Yciwz1oRQmz/Xvsm005CwjHx/b70nx18tVI7005frY108Y/:x/v3/z100" --camera="NhL2/2tO1w/0OIZh005DPZMz/xC9Cz18tVI7005frY108Y/:x/v3/z100" --camera="AbE3/0LWiZz/4Ccj005X5X1z1qJ13x/8BfRky/5frY108Y/:x/v3/z100"
+
+:done
+echo Done.