CUDA Rasterizer

CLICK ME FOR INSTRUCTION OF THIS PROJECT

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4

Joseph Klinger
Tested on: Windows 10, i5-7300HQ (4 CPUs) @ ~2.50GHz, GTX 1050 6030MB (Personal Machine)

README

This week, I took on the task of implementing a rasterizer in CUDA. I have already written a CPU rasterizer (almost 2 years ago, in the introductory graphics course CIS 460), but implementing a basic graphics pipeline on the GPU was a different beast.

The features included in this rasterizer are:

Texture mapping
Supersampling Antialiasing
Color interpolation across triangles

Demo video here.

Rasterization, in very brief summary, is taking a 3d shape and deciding how to color the pixels that the object overlaps. In this project, that involves transforming the input GLTF models' vertex data, creating triangles from that data, projecting the triangles into view->clip->NDC/screen->viewport space, computing line intersection with the edges of the triangle, and shading the overlapping fragments.

Here is an image of the given Duck GLTF model rasterized with texture mapping:

For comparison, here is the same Duck but rendered with SSAA (supersampling antialiasing). This process involves simply rendering to an image of higher resolution than the screen, then downsampling that information into the final image:

Performance Analysis

I benchmarked my rasterizer's performance using the Duck GLFT model, which has ~4000 tris, at a close up and far zoom level. Here are the results:

As we can see, rasterization is by far the most expensive operation compared to vertex transform, primitive assembly, fragment shading and downsampling. Additionally, SSAA, as expected, makes the rasterization process much more costly because we have to render to an image of twice the size of the final, so more fragments must be computed and be checked with the depth test. Lastly, clearly far zoom makes the rasterization process more costly as there are simply more fragments overlapping each triangle.

One experiment I did try was comparing rasterization performance when computing line intersection with the triangle edges as opposed to simply checking all fragments within the bounding box of the primitive. As expected, it did improve performance, as we were able to avoid computing barycentric weights for every potential fragment, only having to replace that with a few lines of line intersection code, where the most expensive operation is a divide (as opposed to a cross product).

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
cmake		cmake
external		external
gltfs		gltfs
renders		renders
src		src
util		util
windows/PROJ4_Rasterizer		windows/PROJ4_Rasterizer
.cproject		.cproject
.gitignore		.gitignore
.project		.project
CMakeLists.txt		CMakeLists.txt
GNUmakefile		GNUmakefile
INSTRUCTION.md		INSTRUCTION.md
README.md		README.md
cis565_rasterizer.launch		cis565_rasterizer.launch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CUDA Rasterizer

README

Performance Analysis

Credits

About

Uh oh!

Releases

Packages

Languages

klingerj/Project4-CUDA-Rasterizer

Folders and files

Latest commit

History

Repository files navigation

CUDA Rasterizer

README

Performance Analysis

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages