Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation: Sparse Resources #455

Open
litherum opened this issue Oct 2, 2019 · 6 comments
Open

Investigation: Sparse Resources #455

litherum opened this issue Oct 2, 2019 · 6 comments
Labels
Milestone

Comments

@litherum
Copy link
Contributor

litherum commented Oct 2, 2019

Sparse resources are a way of using the virtual memory system on the GPU. It’s possible to create a resource that appears to be larger than physical memory, but only has a small portion of that resource actually backed by physical memory. These memory mappings are done at the tile level, where a single miplevel of a texture can be split into a collection of tiles.

Motivation

The benefit is that regions of textures that are unused don’t actually have to be mapped to physical memory, which means they don’t count toward texture memory budgets. This means that memory use is decreased for rendering at a particular quality level, or alternatively, quality is increased when rendering at a particular memory budget.

The benefit occurs when textures are large and include multiple objects (e.g. a texture atlas) but not all of those objects are rendered at the same miplevel. Placing each object into its own texture is unfortunate because resource tracking is performed for every resource, so the overhead associated with many tiny resources can be quite high. On the other hand, having a few giant resources without sparse resources leads to wasted memory, where entire miplevels of huge textures must all be resident. Sparse textures provide a nice middle ground, where resources can be large in virtual memory, but small in physical memory.

In order to characterize this, I used the Modern Rendering with Metal sample code from Apple. There is a simple switch to enable / disable sparse textures. Measuring the memory use shows:

Screen Shot 2019-10-01 at 10 09 47 PM

On this particular sample, using sparse textures results in a 15% reduction of texture memory usage. This indicates that this feature is worth pursuing.

D3D12

There are 4 tiled resource tiers on D3D12. Tier 2 and above requires that unmapped reads return 0 and unmapped writes do nothing. Applications can call ID3D12Device::CreateReservedResource() to create an unmapped resource. To actually make an allocation, an application can call ID3D12Device::CreateHeap().

The way an application associates a resource with memory from a heap is to call ID3D12CommandQueue::UpdateTileMappings(). This call lets an author specify any particular tile region in the resource should be mapped or unmapped to any particular region of a particular heap. It’s possible to have a single resource with mappings that come from distinct heaps, and it’s possible to have a single page in a heap that is backing multiple resources.

The application can ask for information like the tile size and the total number of tiles in a resource by calling ID3D12Device::GetResourceTiling(). An application also can determine how much space it should allocate by calling a collection of functions like ID3D12Device::GetCopyableFootprints(), ID3D12Device::GetResourceAllocationInfo(), and IDXGIAdapter3::QueryVideoMemoryInfo().

There are additional functions in Shader Model 5 to assist in determining which tiles are necessary to render a scene. Sampling returns an additional optional output value which can be fed to CheckAccessFullyMapped() to determine if any of the samples in that sample operation were unmapped. There’s also a LOD clamp parameter so an application can restrict its sampling to occur from a LOD that is guaranteed to be mapped. The application can then perform its own accounting to determine which unmapped tiles have the most samples and are most important to load.

Vulkan

VkPhysicalDeviceFeatures() advertises support for 9 different sparse residency flags. To make a resource that can be sparse, applications can set the VK_IMAGE_CREATE_SPARSE_RESIDENCY_BIT flag inside vkCreateImageView(). To make a memory allocation, applications can call vkAllocateMemory() (just like every other allocation).

The way an application associates a resource with an allocation is to call vkQueueBindSparse(). This call lets an author specify any particular tile region in the resource should be mapped to any particular region of a memory allocation.

There are tons of different restrictions and configuration parameters an application needs to abide by when using sparse resources. Things like VkSparseImageFormatProperties, VkPhysicalDeviceSparseProperties, vkGetPhysicalDeviceSparseImageFormatProperties(), vkGetImageSparseMemoryRequirements(), and vkGetImageMemoryRequirements(). It’s all very complicated and I couldn’t really understand all the complexity.

There isn’t really a good story regarding how an application knows how much size to allocate. vkGetImageSparseMemoryRequirements() and vkGetImageMemoryRequirements() get you part of the way there, but they don’t tell you the current memory pressure. VK_EXT_memory_budget adds functions which tell you this, but the extension is only present on 30% Windows, 32% Linux, and 0% on Android.

For SPIR-V, there are a collection of OpImageSparse* commands, which returns a residency code in addition to the results of the operation. This residency code can be fed to OpImageSparseTexelsResident to determine whether or not the texels are all resident. These functions require the SparseResidency capability. There’s also a LOD clamp parameter so an application can restrict its sampling to occur from a LOD that is guaranteed to be mapped. The application can then perform its own accounting to determine which unmapped tiles have the most samples and are most important to load.

Metal

Metal is simpler than the other two APIs. To detect whether sparse textures are available, applications can call MTLDevice.supportsFamily(MTLGPUFamily.familyApple6). Applications first make an allocation by creating a MTLHeap with the type set to .sparse. Then, they can create a sparse resource associated with that heap by calling MTLHeap.makeTexture() on that heap. Multiple textures can be associated with a single heap.

Metal added a new type of Encoder which governs the mapping between physical and virtual memory: MTLResourceStateCommandEncoder. This function only lets you map and unmap a particular region of a resource. You can’t specify which part of the heap is supposed to back the texture. You can’t specify that the same physical page gets mapped to two distinct textures. You can’t specify that a resource should be backed by memory from two different heaps.

Similar to the other APIs, there are a collection of restrictions which indicate to the application how they are expected to use the mapping: MTLDevice.sparseTileSizeInBytes, MTLDevice.sparseTileSize(), MTLDevice.convertSparsePixelRegions(), MTLDevice.convertSparseTileRegions(), and MTLTexture.firstMipmapInTail.

An application knows how big to make their allocation by calling functions on the MTLDevice: MTLDevice.heapTextureSizeAndAlign(), MTLDevice.recommendedMaxWorkingSetSize, MTLDevice.currentAllocatedSize.

Metal has a great feature called Texture Access Counters which automatically count how often tiles from textures are accessed. An application can get access to these counters by calling MTLBlitCommandEncoder.getTextureAccessCounters(). This means they don’t have to do their own bookkeeping that they would have to do in the other two APIs.

Metal Shading Language includes new functions like sparse_sample() which returns additional information to let you know whether the samples were mapped. There’s also a LOD clamp parameter so an application can restrict its sampling to occur from a LOD that is guaranteed to be mapped.

OpenGL (just for fun)

It’s governed by two extensions: GL_ARB_sparse_texture (40%) and GL_ARB_sparse_buffer (45%). In ES, it's governed by GL_EXT_sparse_texture (1%).

Conclusion

There’s a lot of complexity here. Because Vulkan’s feature support so complicated, we would have to figure out which support we can add that is a good balance of ubiquity and usefulness.

2 of the 3 APIs build sparse textures on top of heaps, but WebGPU has no concept of heaps. We would probably have to add support for heaps in order to get sparse resource functionality.

Metal’s has two requirements: the mappings for a resource must only come from a single heap, and a single tile can’t be mapped to two resources. These requirements will have to be incorporated into whatever we do here.

We’ll also have to figure out how it interacts with compressed textures, if at all.

@litherum litherum changed the title Sparse Resources Investigation: Sparse Resources Oct 2, 2019
@Degerz
Copy link

Degerz commented Oct 2, 2019

Honestly, sparse resources probably aren't worth baking into the API given the complexity and drawbacks. This feature seems to have fallen from favour and I don't believe it'll be the direction the industry is headed towards ...

Only being able to update the tile mappings from the CPU has a high fixed overhead cost. This makes a lot of real-time use cases with this feature such as virtual shadow mapping not possible with a lot of current hardware implementation.

In Vulkan, they also allow nonstandard block shapes on some vendors. On AMD, I think prior to GCN 5/Vega their GPUs used to have nonstandard 3D block shapes like 64x64x4 for 32bpp formats. On PowerVR, none of their sparse resources have standardized block shapes.

@krogovin
Copy link

I totally disagree with Degerz.

I would like to see sparse textures make there way to WebGPU (as optional feature) with an interface for an application to query what the block size is in order to facilitate its ability to be implemented on many platforms. Each of the targeted API's: Metal, Vulkan and Direct3D 12 have that sparse textures are optionally available on supported hardware.

Without sparse texturing, applications need to do quite convoluted and inefficient shader (and application) logic to get something sort of like it. The obvious uses of atlasing and shadow maps are a really big deal.

To stress, I am not advocating requiring the feature, I am advocating making it an optional feature.

@Degerz
Copy link

Degerz commented Jun 16, 2020

You'd be surprised but the alternatives ended up being better in real world practice ...

Using sparse resources for shadow mapping is a bad idea since you would have to update the sparse bindings every frame which has a very high cost. You'd have to either stall the GPU or cope with these visual artifacts caused by introducing a frame of latency.

Even the vast majority of consoles games (including the ones with high-end graphics) avoid using sparse texturing for atlases or virtual texturing and they instead opt for some sort of 'indirection' solution for those techniques.

There's practically no real-time use cases for this feature so it would be mostly relegated to academic/research uses where authors don't care much about performance at all.

@krogovin
Copy link

krogovin commented Jun 18, 2020

Even the vast majority of consoles games (including the ones with high-end graphics) avoid using sparse texturing for atlases or virtual texturing and they instead opt for some sort of 'indirection' solution for those techniques.

The main issue for sparse texturing is that not all hardware implements it (with Intel on desktop being the biggest issue). As one who has implemented my own indirect solution to handle texture atlasing, I can say that without a doubt that sparse texturing is oodles faster and more reliable. This is so because of the following reasons:

  • doing an indirect (something as found in the game Brink for example) requires atleast one additional texture lookup and will bork using the hardware sampler to do some of the filtering requiring either doing some of the filtering one self and/or repeated texels
  • doing a more traditional atlasing together with a shader to make mipmaps works nicely requires using textureLod() and computing the LOD oneself. That LOD means each pixel requires a log2 evaluation. In contrast just doing texture() means the HW sampler computes the LOD. Lastly, and in truth the worse part, is that doing the traditional texture atlasing gives a nasty 2D problem of packing textures which is quite icky for real-time adjust.
  • when sparse texturing is used, there is essentially zero hardware overhead because it is literally just taking advantage of the GPU's MMU. In addition, the hw filtering can be fully relied upon.

I have real-time use cases where sparse texturing is helpful. Given that it is available on desktop for both AMD an NVIDIA along with some of the mobile GPU's, it is a really good idea to expose the HW feature as an extension or optional feature. Lastly, sparse texturing is part of Vulkan (as an optional feature), Metal and Direct3D 12. Obviously, each of the parties that made those API's thought it was worth it to make it an optional feature. As such, it is pretty clear it should be an optional feature in WebGPU.

@alecazam
Copy link

alecazam commented Oct 4, 2020

Hey Myles, this is a great analysis of VT across platforms. This feels like an important feature, since world density far exceeds gpu memory, and iOS and Android still jetsam the app without clear indicators of when that can occur memory wise.

Xbox and PS5 seem to be taking the backing store, decompressing the tile data from zlib or Kracken, and then feeding that into a VT for the Spiderman demo where you move through a city without load times or stalls. Limiting/avoiding the fallback to smaller mips is key. Also compressed blocks aren't small enough on disk without further compression applied.

I typically see a 2D texture divided into an atlas of many different tiles sizes. This also allows all draw calls to proceed without having to change the texture in the sampler. That works if you do clamping or wrap/mirror and padding on the edge in the shader. One workaround for that is possibly to use a sparse 2D array texture that is also a VT, and then map slices/mips in and out of a slice. The same 2D atlas technique could be applied to a slice, or a higher mip used for non-mipmapped data, or smaller mips with tail packing used on partial atlas mips that don't use the smallest levels. MTLView could be used to an array slice to turn them into 2D textures, but this breaks the ability to render many draw calls using a megashader.

On desktop the tile size is 64KB. On iOS the tile size is 16KB. Here are tileSizes.
Format Desktop Mobile
ASTC/BC7 256x256 128x128
BC1/ETCr11 512x256 256x128

If this varies dramatically across systems, then that puts a lot of effort on content generation to breakup mips per tile-size and format variant. And the larger the tile size, the less granular data can be swapped in/out. For example, mips below that tile size form a packed mip tail. Easier to implement when working with consoles with a single tile size.

Indirection techniques that emulate VT hardware end up with complicated workarounds of padding that make the content generation and sampling too complicated. Sean Barrett's talk mentions several of these. Especially for aniso/mip lookup, relying on VT hardware is critical, but that's currently A13 and newer macOS GPUs running Big Sur. Emulated VT also requires complex feedback from rendering storing missing mip LOD in a texture, and then a cpu-gpu stall to readback that data.

@Degerz
Copy link

Degerz commented Feb 16, 2024

Here's somewhat recent data on how sparse resource binding performs on a high-end desktop GPU ...

13a6343ee6d5379b746f446341241e9ee07706cc_2_690x426

@kainino0x kainino0x added the api WebGPU API label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants